Article URL: https://www.haveibeensquatted.com/
Comments URL: https://news.ycombinator.com/item?id=38434493
Points: 12
# Comments: 4
Article URL: https://www.aigaragesale.com/
Comments URL: https://news.ycombinator.com/item?id=38434225
Points: 10
# Comments: 2
I used to enjoy Translation Party, and over the weekend I realized that we can build the same feedback loop with DALLE-3 and GPT4-Vision. Start with a text prompt, let DALLE-3 generate an image, then GPT-4 Vision turns that image back into a text prompt, DALLE-3 creates another image, and so on.
You need to bring your own OpenAI API key (costs about $0.10/run)
Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results.
Here's a few of my favorites:
- Gnomes: https://dalle.party/?party=k4eeMQ6I
- Start with a sailboat but bias GPT4V to "replace everything with cats": https://dalle.party/?party=0uKfJjQn
- A more stable one (but everyone is always an actor): https://dalle.party/?party=oxpeZKh5
Comments URL: https://news.ycombinator.com/item?id=38432486
Points: 129
# Comments: 40
Hi HN - I’m excited to share a fun side project we built recently
CodebaseChat.com is a tool for building a GPT chatbot for any GitHub repo in 30 seconds
It can be helpful when onboarding to new codebases, when needing help understanding system design, asking for less technical explanations of functionality
We’ve been heads down building Context.ai, the analytics platform for LLM products. When OpenAI released GPTs earlier this month, we built one to answer questions about our growing codebase. It worked so well that we decided to open source the utility for other dev teams
How it works:
- Submit a GitHub repo URL at CodebaseChat.com
- We’ll give you a repo.md file
- Upload that to OpenAI’s GPT builder
- Voilà! You can ask your GPT about your codebase
Don’t have ChatGPT Plus? No worries, you can use OpenAI’s Assistants Playground completely for free.
Have an idea to improve the project? Submit a PR at https://github.com/contextco/codebasechat
Comments URL: https://news.ycombinator.com/item?id=38432195
Points: 10
# Comments: 1
Article URL: https://blog.gitea.com/gitea-cloud/
Comments URL: https://news.ycombinator.com/item?id=38430029
Points: 11
# Comments: 0
Article URL: https://drawfast.io/
Comments URL: https://news.ycombinator.com/item?id=38429060
Points: 14
# Comments: 6
Hey everyone. It's Sherub here, author of the Build your own DNS Server challenge on CodeCrafters. Currently it’s available in Rust, Go, and Python and is free while in beta.
https://codecrafters.io/dns-server
I've kept the challenge accessible but still challenging for an intermediate developer. This challenge, like others from CodeCrafters, is self-paced. You can use any tools you prefer (terminal, editor, etc.) to build the project.
At the end of the challenge, you will have created a DNS forwarding server. The server can create and read DNS packets and respond to DNS queries. As you go, you'll learn about the DNS protocol, its format, servers, and A records. All while getting to hone your language skills.
Some of the challenges and choices I had to make:
* To make the stages easier, I had to break them up, such that each step builds on the previous one. This was especially challenging for the 2nd stage, where we write a DNS packet's header contents. Even though I’d have liked it to be easier, breaking it up further would have been weird.
* Instead of implementing a recursive resolver, I've restricted to a forwarding server. We made this decision so that most developers can still use it. To add more complexity, we can use a challenge extension (noted below).
* Deciding how much instruction and context the stages should provide. I’ve decided to keep them as thorough as possible for most of the stages. Developers can choose to have thorough details or just skim through them.
I would love your feedback and questions on the challenge. You can try it out for free here: https://codecrafters.io/dns-server (no CC required).
I also have challenge extensions planned. You can find them at https://app.codecrafters.io/vote/challenge-extension-ideas?c.... I'm also keen to hear what you think about the extension ideas.
Comments URL: https://news.ycombinator.com/item?id=38428957
Points: 19
# Comments: 4
Problem: I (accidentally) hoard domains :')
1. I get excited about a new project 2. I buy a domain 3. I get busy, and the domain collects a thick layer of dust
I know I'm not alone in this, either
So, I had the idea of creating a simple and casual marketplace for folks like me to list their domains at a fair price with a nice community feel to free up these caged domains
It felt like a great project for me to pick up some new skills, so I got to it
All up, it took me about a month, and I built the whole thing live on Twitch
I've always sat on the design, marketing and front-end side of the fence, so this was my first attempt at making a 'full' web app
Here's the stack I used:
- SvelteKit (https://kit.svelte.dev/)
- Supabase (https://supabase.com/)
- Resend (https://resend.com/)
- ShadCN Svelte (https://www.shadcn-svelte.com/)
It was super fun to build, and as a beginner, I learnt so much
I leaned on AI quite heavily to help advance my speed of groking certain concepts within both SvelteKit & Supabase, and I blogged about the experience and my learnings here: https://aroreretini.dev/projects/dwarf/
Any feedback/criticism very much welcome, I've got a lot to learn :)
Comments URL: https://news.ycombinator.com/item?id=38425677
Points: 11
# Comments: 6
Article URL: https://www.gitaware.com/
Comments URL: https://news.ycombinator.com/item?id=38422218
Points: 10
# Comments: 0
Article URL: https://github.com/novuhq/
Comments URL: https://news.ycombinator.com/item?id=38419513
Points: 101
# Comments: 17
Article URL: https://encore.dev/blog/retries
Comments URL: https://news.ycombinator.com/item?id=38392540
Points: 113
# Comments: 29
I asked this question at the orange site and wanted to hear you guys too :-)
Amidst all the software enshittification that we are seeing every day, what software are you thankful for? That makes your life better? I’ll start
Linux kernel (duh!)
Void Linux (cured my distro hooping!)
Kicad (the only software my wife has seen me use and say “that looks expensive!”)
Inkscape
Sublime Text
gcc and clang
fish (the shell)
KDE
Things 3
Miniflux
iTerm2
brew.sh
GoLand by Jetbrains
How about you guys?
Article URL: https://engineering.fb.com/2023/08/07/developer-tools/fixit-2-linter-meta/
Comments URL: https://news.ycombinator.com/item?id=38378776
Points: 102
# Comments: 87
Article URL: https://blog.miguelgrinberg.com/post/it-s-time-for-a-change-datetime-utcnow-is-now-deprecated
Comments URL: https://news.ycombinator.com/item?id=38333116
Points: 100
# Comments: 98
Article URL: https://stefan-marr.de/2023/11/python-global-interpreter-lock/
Comments URL: https://news.ycombinator.com/item?id=38302903
Points: 96
# Comments: 128
Article URL: https://arxiv.org/abs/2311.09247
Comments URL: https://news.ycombinator.com/item?id=38331669
Points: 114
# Comments: 72
TensorFlow 2.15 has been released! Highlights of this release (and 2.14) include a much simpler installation method for NVIDIA CUDA libraries for Linux, oneDNN CPU performance optimizations for Windows x64 and x86, full availability of tf.function
types, an upgrade to Clang 17.0.1, and much more! For the full release note, please check here.
The tensorflow
pip package has a new, optional installation method for Linux that installs necessary NVIDIA CUDA libraries through pip. As long as the NVIDIA driver is already installed on the system, you may now run pip install tensorflow[and-cuda]
to install TensorFlow's NVIDIA CUDA library dependencies in the Python environment. Aside from the NVIDIA driver, no other pre-existing NVIDIA CUDA packages are necessary. In TensorFlow 2.15, CUDA has been upgraded to version 12.2.
For Windows x64 & x86 packages, oneDNN optimizations are now enabled by default on X86 CPUs. These optimizations can be enabled or disabled by setting the environment variable TF_ENABLE_ONEDNN_OPTS
to 1
(enable) or 0
(disable) before running TensorFlow. To fall back to default settings, simply unset the environment variable.
tf.function
types are now fully available.
tf.types.experimental.TraceType
now allows custom tf.function inputs to declare Tensor decomposition and type casting support.
- Introducing
tf.types.experimental.FunctionType
as the comprehensive representation of the signature of tf.function callables. It can be accessed through thefunction_type
property oftf.function’s
andConcreteFunctions
. See thetf.types.experimental.FunctionType
documentation for more details.
- Introducing
tf.types.experimental.AtomicFunction
as the fastest way to perform TF computations in Python. This capability can be accessed through theinference_fn
property ofConcreteFunctions
. (Does not support gradients.) See thetf.types.experimental.AtomicFunction
documentation for how to call and use it.
TensorFlow PIP packages are now being built with Clang 17 and CUDA 12.2 to improve performance for NVIDIA Hopper-based GPUs. Moving forward, Clang 17 will be the default C++ compiler for TensorFlow. We recommend upgrading your compiler to Clang 17 when building TensorFlow from source.
The Women in ML Symposium is an inclusive event for anyone passionate about the transformative fields of Machine Learning (ML) and Artificial Intelligence (AI). Dive into the latest advancements in generative AI, explore the intricacies of privacy-preserving AI, dig into the underlying accelerators and ML frameworks that power models, and uncover practical applications of ML across multiple industries.
Our event offers sessions for all expertise levels, from beginners to advanced practitioners. Hear about what’s new in ML and building with Google AI from our keynote speakers, gain insights from seasoned industry leaders across Google Health, Nvidia, Adobe, and more – and discover a wealth of knowledge on topics ranging from foundational AI concepts to open source tools, techniques, and beyond.
RSVP today to secure your spot and explore our exciting agenda. We can't wait to see you there!
Hello everyone!
Django 5.0 RC1 was released yesterday, establishing the string freeze for the 5.0 release. This means that strings marked for translations will not change between now and the 5.0 final release, scheduled for December 4th.
It would be extremely helpful if you could ensure that the Django translations for the languages you collaborate with are complete on Transifex . I’ll be fetching the available translations on Friday, December 1st, in preparation for the release of the following Monday.
For more information about Django translations, refer to this link.
Thank you very much for your help!
Cheers, Natalia.
1 post - 1 participant
Hello everyone,
We are currently working on adding a feature for Django ticket #34277. The goal is to introduce a conditional WHERE clause in the bulk_create
method to enable conditional updates when using the bulk_create
function with update_conflicts=True
.
What we have accomplished so far:
bulk_create
: We added additional parameters to bulk_create
, including update_conflicts
, update_fields
, unique_fields
, and where_clause
._insert
, _batched_insert
, and others relevant to the process.Q
object) into valid SQL, using a SQL compiler obtained via get_compiler()
.Encountered Issue:
When attempting to compile the Q
object (representing the WHERE clause) into SQL, we are facing an error: 'Q' object has no attribute 'as_sql'
. We tried to bypass this issue by using compiler.compile(where_clause)
, but this did not lead to a viable solution.
Questions and Need for Clarifications:
Q
to WhereNode
: How can we properly convert a Q
object into a WhereNode
object so it can be compiled into SQL? Is there a standard approach for this within Django’s framework, or should we consider a custom method?bulk_create
?We would greatly appreciate any feedback, advice, or code examples that could help us progress in this complex task. We aim to make a significant contribution to Django while ensuring the robustness and compatibility of this new feature.
Thank you in advance for your time and expertise.
Best regards,
Barhamou
# myapp/views.py
from django.http import HttpResponse
from .models import Item
from django.utils import timezone
from datetime import timedelta
from django.db.models import F, Q
def test_bulk_create_view(request):
new_items = [
Item(id=1, name="Item 1 encore une modification", last_updated=timezone.now() - timedelta(days=1)),
Item(id=2, name="Item 2 Updated", last_updated=timezone.now() - timedelta(days=1)),
Item(id=3, name="Item 3 New")
]
Item.objects.bulk_create(
new_items,
update_conflicts=True,
update_fields=['name', 'last_updated'], # Champs à mettre à jour
unique_fields=['id'], # Champ unique susceptible de déclencher l'upsert
where_clause=Q(last_updated__lt=F('EXCLUDED__last_updated'))
)
#Item.objects.bulk_create(new_items, where_clause='Une clause where')
#Item.objects.bulk_create(new_items, update_conflicts=True)
return HttpResponse("bulk_create testé avec succès")
1 post - 1 participant
Django’s accessibility team is looking for feedback on proposed guidelines for contributors: Accessibility guidelines for all contributors #17338 (read the guidelines as HTML).
Django’s accessibility currently leaves a lot to be desired. It’s crucial for us to introduce those guidelines, and crucial for the guidelines to be strict enough to allow us to improve, while not being an unfair burden for contributors working on UI changes, which is challenging enough as it is. We need feedback from people with different perspectives to make sure this will work for everyone.
Two ways!
We’re looking for feedback from all contributors, in particular new contributors and people who could see themselves contributing in the future. We also want feedback from @steering_council members, who have a formal mandate to oversee the quality of Django.
Lots of reasons. Formally, the introduction of those guidelines is as per DEP-11. Excerpt from the team’s responsibilities:
- Deciding on any relevant accessibility guidelines to follow, such as WCAG, and at which conformance level. […]
- Coordinating […] the improvement of the accessibility in general in Django and associated projects. […]
- Writing and maintaining documentation relating to accessibility, such as a statement of commitment to accessibility issues, and contribution guidelines.
1 post - 1 participant
If your company has strict database design guidelines, it is an extra burden to adhere to them where the ManyToManyField()
is concerned because you must define a through model in order to have custom column names in the through table.
You can define a custom table name with the db_table
argument. However, there is no way to define what the column names will be in that table unless you go to the trouble of defining a custom through
Model, which adds extra code and somewhat defeats the purpose of the ManyToManyField()
, especially, when you don’t need additional fields in your through table.
This also breaks the convenient .set(), .add(), .create()
methods on the ManyToManyField()
instance when it shouldn’t break because you haven’t added extra fields on the through table.
In the following example, Django’s default name for the foreign key in the through table would be shoppingcart_id
, however, this doesn’t work if your company’s style guide wants the name to be shopping_cart_id
.
One solution would be to support custom column names with arguments like from_column_name
and to_column_name
in the following example.
class Item(models.Model):
name = models.CharField(max_length=255)
class Meta:
db_table = "item"
class ShoppingCart(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
items = models.ManyToManyField(Item, from_column_name="shopping_cart", to_column_name="item")
class Meta:
db_table = "shopping_cart"
Another idea would be to use the existing ManyToManyField.through_fields
argument for defining custom column names. For example:
items = models.ManyToManyField(Item, through_fields=("shopping_cart", "item"))
Another idea would be to have a setting in a project that defaults to breaking up words with underscores in the through table column names, however, this wouldn’t allow for as much flexibility and wouldn’t work for an existing project.
Instead, what you have to do now is create a through
model like the following:
class Item(models.Model):
name = models.CharField(max_length=255)
class Meta:
db_table = "item"
class ShoppingCart(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
items = models.ManyToManyField(Item, through="ShoppingCartItem")
class Meta:
db_table = "shopping_cart"
class ShoppingCartItem(models.Model):
shopping_cart = models.ForeignKey(ShoppingCart, on_delete=models.CASCADE)
item = models.ForeignKey(Item, on_delete=models.CASCADE)
class Meta:
db_table = "shopping_cart_item"
I would be happy to take a stab at creating a PR if others familiar with the ORM would point me in the right direction and if we could get a design decision on what method to go with.
1 post - 1 participant
While adding a query_params
parameter to test client and request factory methods, I hit the issue that for GET
and HEAD
requests we can already pass in a data
parameter to pass query parameters. The data
parameter is also used for form data in other request methods. Everyone (so far) agrees we shouldn’t have both for these methods.
I suggested deprecating the data
parameter for GET
and HEAD
requests. @felixxm instead prefers not adding query_params
to them and sticking with data
.
I am going to assume that the reason for this is for backwards compatibility, but I’ll let him tell us if there are some other issues.
But the reasons I prefer the deprecation:
5 posts - 5 participants
Hello,
This Jamil from Uganda, I wish someone concerned for the Luganda translation issue to let me in, I have sent my request.
Or, whoever concerned can let me through the process of becoming the coordinator because the language is as down as 3%.
Or is there a way for me to contact the coordinator?
2 posts - 2 participants
I made a comment on that issue, but there may be more responses here. Basically, the Django documentation seems to say that the “default” database is both required and not required, depending on where you look. Seems like at least one of the places in the docs should be changed.
Any thoughts or suggestions?
2 posts - 2 participants
Except you, because you are special
1 post - 1 participant
I posted a comment on ticket 24306 about possibly using unlogged tables for the database cache. Would there be any interest in adding that functionality to the createcachetable
command?
5 posts - 3 participants
Hi,
after migrating from django 3.2 to 4.2 I’ve noticed persistant connections doesn’t work the same for the new version.
I’m using CONN_MAX_AGE=20 setting
For Django 3.2 I can see that oracle doesn’t create new sessions in v$session table on each request.
But in Django 4.2 there is new session created for each request even though we haven’t changed anything in the database connections.
STACK:
Any ideas how to recreate the old behaviour on the new version of Django?
2 posts - 2 participants
Atomicals CLI and Javascript Library
atomicals.xyz Documentation: https://docs.atomicals.xyz
Download the github repo and then run:
npm install
npm run build
See all commands at:
npm run cli --help
First install packages and build, then follow the steps here to create your first Atomical and query the status. Use yarn cli
to get a list of all commands available.
The environment file comes with defaults (.env.example
), but it is highly recommend to install and operate your own ElectrumX server. Web browser communication is possible through the wss
(secure websockets) interface of ElectrumX.
ELECTRUMX_WSS=wss://electrumx.atomicals.xyz:50012
// Optional (defaults to wallet.json)
WALLET_PATH=path-to-wallet.json
ELECTRUMX_WSS: URL of the ElectrumX with Atomicals support. Note that only wss
endpoints are accessible from web browsers.
The purpose of the wallet is to create p2tr (pay-to-taproot) spend scripts and to receive change from the transactions made for the various operations. Do not put more funds than you can afford to lose, as this is still beta!
To initialize a new wallet.json
file that will store your address for receiving change use the wallet-init
command. Alternatively, you may populate the wallet.json
manually, ensuring that the address at m/44'/0'/0'/0/0
is equal to the address and the derivePath is set correctly.
Configure the path in the environment .env
file to point to your wallet file. defaults to ./wallet.json
Default:
WALLET_PATH=.
WALLET_FILE=wallet.json
Update to wallets/
directory:
WALLET_PATH=./wallets
WALLET_FILE=wallet.json
Create the wallet:
yarn cli wallet-init
>>>
Wallet created at wallet.json
phrase: maple maple maple maple maple maple maple maple maple maple maple maple
Legacy address (for change): 1FXL2CJ9nAC...u3e9Evdsa2pKrPhkag
Derive Path: m/44'/0'/0'/0/0
WIF: L5Sa65gNR6QsBjqK.....r6o4YzcqNRnJ1p4a6GPxqQQ
------------------------------------------------------
yarn cli --help
Get all of the commands available:
npm run cli --help
Read the documentation at https://docs.atomicals.xyz
See updated ElectrumX (https://github.com/atomicals/electrumx-atomicals)
https://x.com/atomicalsxyz (X - Formerly Twitter)
We greatly appreciate any donation to help support Atomicals Protocol development. We worked out of passion and kindness for the world, we believe this technology must exist and be free for all to use. Bitcoin is our one hope for freedom and digital sovereignty and we intend to do our best to make it a reality.
BTC: bc1pljy9g0ugrgumpd5y6v9tv23rvz5y8dhaq980r9qfgyhd4dmgkwmqpdpr5q
An Autonomous LLM Agent for Complex Task Solving
Tutorial • Demo • Blog • Documentation • Citation
XAgent is an open-source experimental Large Language Model (LLM) driven autonomous agent that can automatically solve various tasks. It is designed to be a general-purpose agent that can be applied to a wide range of tasks. XAgent is still in its early stages, and we are working hard to improve it.
🏆 Our goal is to create a super-intelligent agent that can solve any given task!
We welcome diverse forms of collaborations, including full-time and part-time roles and more. If you are interested in the frontiers of agents and want to join us in realizing true autonomous agents, please contact us at xagentteam@gmail.com.
XAgent is designed with the following features:
XAgent is composed of three parts:
ToolServer is the server that provides XAgent with powerful and safe tools to solve tasks. It is a docker container that provides a safe environment for XAgent to run. Currently, ToolServer provides the following tools:
ToolServer is where XAgent's action takes place. It is a docker container that provides a safe environment for XAgent to run. So you should install docker
and docker-compose
first. After that, you should build the docker image for ToolServer and start the docker container.
docker-compose up --build
This will build the image for the ToolServer and start the ToolServer's container. If you want to run the container in the background, please use docker-compose up -d --build
. Refer here for detailed information about our ToolServer.
If the ToolServer is updated, you have to rebuild the images:
docker compose build
After setting up ToolServer, you can start to run XAgent.
pip install -r requirements.txt
assets/config.yml
before running it.assets/config.yml
, which is used to access OpenAI API. We highly recommend using gpt-4-32k
to run XAgent; gpt-4
is also OK for most simple tasks. In any case, at least one gpt-3.5-turbo-16k
API key should be provided as a backup model. We do not test or recommend using gpt-3.5-turbo
to run XAgent due to minimal context length; you should not try to run XAgent on that.XAgentServer
, you should modify the CONFIG_FILE
value in .env
file and restart the docker container.python run.py --task "put your task here" --model "gpt-4" --config_file "assets/config.yml"
You can use the argument --upload_files
to select the initial files you want to submit to XAgent.
The local workspace for your XAgent is in local_workspace
, where you can find all the files generated by XAgent throughout the running process.
After execution, the entire workspace
in ToolServerNode
will be copied to running_records
for your convenience.
Besides, in running_records
, you can find all the intermediate steps information, e.g., task statuses, LLM's input-output pairs, used tools, etc.
You can load from a record to reproduce a former run, just by setting record_dir
in config(default to Null
). The record is a system-level recording tied to the code version of XAgent. All running-config、query、code execution statuses (including errors)、server behavior will be documented.
We have removed all sensitive information (including API keys) from the record so you can safely share it with others. In the near future, we will introduce more granular sharing options highlighting the contributions of humans during execution.
## We ran the web ui docker when building the ToolServer network
## run nginx in docker
docker exec XAgent-Server systemctl start nginx
Build the docker image for XAgent-Server and start the docker container. You will see the XAgent Server listening on port 8090
. You could visit http://localhost:5173
to interact with XAgent by using web UI. Refer here for the detailed information about our GUI Demo.
Here, we also show some cases of solving tasks by XAgent: You can check our live demo on XAgent Official Website. We also provide a video demo and showcases of using XAgent here:
We start with a case of aiding users in intricate data analysis. Here, our user submitted an iris.zip
file to XAgent, seeking assistance in data analysis. XAgent swiftly broke down the task into four sub-tasks: (1) data inspection and comprehension, (2) verification of the system's Python environment for relevant data analysis libraries, (3) crafting data analysis code for data processing and analysis, and (4) compiling an analytical report based on the Python code's execution results. Here is a figure drawn by XAgent.
Empowered with the unique capability to actively seek human assistance and collaborate in problem-solving, XAgent continues to redefine the boundaries of human-agent cooperation. As depicted in the screenshot below, a user sought XAgent's aid in recommending some great restaurants for a friendly gathering yet failed to provide specific details. Recognizing the insufficiency of the provided information, XAgent employed the AskForHumanHelp tool, prompting human intervention to elicit the user's preferred location, budget constraints, culinary preferences, and dietary restrictions. Armed with this valuable feedback, XAgent seamlessly generated tailored restaurant recommendations, ensuring a personalized and satisfying experience for the user and their friends.
XAgent not only tackles mundane tasks but also serves as an invaluable aid in complex tasks such as model training. Here, we show a scenario where a user desires to analyze movie reviews and evaluate the public sentiment surrounding particular films. In response, XAgent promptly initiates the process by downloading the IMDB dataset to train a cutting-edge BERT model (see screenshot below), harnessing the power of deep learning. Armed with this trained BERT model, XAgent seamlessly navigates the intricate nuances of movie reviews, offering insightful predictions regarding the public's perception of various films.
We conduct human preference evaluation to evaluate XAgent's performance. We prepare over 50 real-world complex tasks for assessment, which can be categorized into 5 classes: Search and Report, Coding and Developing, Data Analysis, Math, and Life Assistant. We compare the results of XAgent with AutoGPT, which shows a total win of XAgent over AutoGPT. All running records can refer to here.
We report a significant improvement of XAgent over AutoGPT in terms of human preference.
We also evaluate XAgent on the following benchmarks:
Our blog is available at here!
A heartfelt thank you to all our contributors. Your efforts make this project grow and thrive. Every contribution, big or small, is invaluable.
If you find our repo useful, please kindly consider citing:
@misc{xagent2023,
title={XAgent: An Autonomous Agent for Complex Task Solving},
author={XAgent Team},
year={2023},
}
🤖 Lobe Chat - an open-source, vision supported, extensible, high-performance chat client. It supports one-click free deployment of your private ChatGPT/LLM web application.
LobeChat is an open-source, extensible (Function Calling) high-performance chatbot framework.
It supports one-click free deployment of your private ChatGPT/LLM web application.
English · 简体中文 · Changelog · Wiki · Report Bug · Request Feature
Share LobeChat Repository
Pioneering the new age of thinking and creating. Built for you, the Super Individual.
Please be aware that LobeChat is currently under active development, and feedback is welcome for any issues encountered.
Join our Discord community! This is where you can connect with developers and other enthusiastic users of LobeHub. |
[!IMPORTANT]
Star Us, You will receive all release notifications from GitHub without any delay ~ ⭐️
[!NOTE]
You can find our upcoming Roadmap plans in the Projects section.
Beside these features, LobeChat also have much better basic technique underground:
1
Function Calling Plugin SystemBy establishing a versatile plugin system, ChatGPT becomes capable of delivering real-time news updates and enhancing your ability to interact with documents and e-commerce data more effectively. This extended functionality positions ChatGPT as a valuable resource across diverse domains. If you have an interest in creating plugins, we offer comprehensive component development documentation, software development kits (SDKs), and pre-made templates in the 🧩 Plugin System section below. Join us in our collective efforts to empower ChatGPT, making it both more potent and user-friendly.
2
Prompt Agent MarketIn our agent market. We have accumulated a large number of practical, prompt agents that have been used in daily work and study. You can also share your agents here and iterate and optimize your prompt agents with more people. You can submit your agents through 🤖/🏪 Submit Agents, and our automated i18n workflow will automatically translate your agents into multiple languages, allowing users worldwide to enjoy your wisdom.
Recent Submits DescriptionExpert Agent Mentor By tcmonster on 2023-11-16 |
Call on expert agents perfectly suited for the task to support your goalstask-guidance execution-plan communication support |
Full-stack Developer By cloverfield11 on 2023-11-15 |
Full-stack web developer with experience in HTML, CSS, JavaScript, Python, Java, Ruby, and frameworks such as React, Angular, Vue.js, Express, Django, Next.js, Flask, or Ruby on Rails. Experience in databases, application architecture, security, and testing.web-development front-end back-end programming databases |
Graphic Creative Master By yingxirz on 2023-11-15 |
Specializes in graphic creative design and graphic creativitygraphic creative design graphic-design |
Expert Agent Mentor By tcmonster on 2023-11-15 |
Call on expert agents perfectly suited for the task to support your goalstask-guidance execution-plan communication support |
📊 Total agents: 48
3
Progress Web AppUtilize the Progressive Web Application (PWA) technology to achieve a seamless LobeChat experience on your computer or mobile device.
[!NOTE]
If you are unfamiliar with the installation process of PWA, you can add LobeChat as your desktop application (also applicable to mobile devices) by following these steps:
- Launch the Chrome or Edge browser on your computer.
- Visit the LobeChat webpage.
- In the upper right corner of the address bar, click on the Install icon.
- Follow the instructions on the screen to complete the PWA Installation.
4
Theme Mode SelectionLobeChat offers two unique theme modes - Light Mode and Dark Mode, as well as rich color customization options to meet your personalized needs. By default, our themes will intelligently switch based on your system settings, but if you prefer manual control, you can easily switch in the settings.
5
Mobile Device AdaptationWe have carried out a series of optimization designs for mobile devices to enhance the user's mobile experience. Currently, we are iterating on the mobile user experience to achieve smoother and more intuitive interactions. If you have any suggestions or ideas, we welcome you to provide feedback through GitHub Issues or Pull Requests.
🚧 Additional snapshots and demonstrations are being progressively added...
Desktop Mobile[!NOTE]
The complete list of reports can be found in the 📘 Lighthouse Reports
📑 Lighthouse Report | 📑 Lighthouse Report |
LobeChat provides Self-Hosted Version with Vercel and Docker Image. This allows you to deploy your own chatbot within a few minutes without any prior knowledge.
A
Deploying with VercelIf you want to deploy this service yourself on Vercel, you can follow these steps:
OPENAI_API_KEY
(required) and ACCESS_CODE
(recommended) on the environment variable section.If you have deployed your own project following the one-click deployment steps in the README, you might encounter constant prompts indicating "updates available." This is because Vercel defaults to creating a new project instead of forking this one, resulting in an inability to detect updates accurately.
[!TIP]
We suggest you redeploy using the following steps, 📘 Maintaining Updates with LobeChat Self-Deployment.
B
Deploying with DockerWe provide a Docker image for deploying the LobeChat service on your own private device. Use the following command to start the LobeChat service:
$ docker run -d -p 3210:3210 \
-e OPENAI_API_KEY=sk-xxxx \
-e ACCESS_CODE=lobe66 \
lobehub/lobe-chat
[!TIP]
If you need to use the OpenAI service through a proxy, you can configure the proxy address using the
OPENAI_PROXY_URL
environment variable:
$ docker run -d -p 3210:3210 \
-e OPENAI_API_KEY=sk-xxxx \
-e OPENAI_PROXY_URL=https://api-proxy.com/v1 \
-e ACCESS_CODE=lobe66 \
lobehub/lobe-chat
[!NOTE]
For detailed instructions on deploying with Docker, please refer to the 📘 Docker Deployment Guide
This project provides some additional configuration items set with environment variables:
Environment Variable Required Description ExampleOPENAI_API_KEY |
Yes | This is the API key you apply on the OpenAI account page | sk-xxxxxx...xxxxxx |
OPENAI_PROXY_URL |
No | If you manually configure the OpenAI interface proxy, you can use this configuration item to override the default OpenAI API request base URL | https://api.chatanywhere.cn/v1 The default value is https://api.openai.com/v1 |
ACCESS_CODE |
No | Add a password to access this service; the password should be a 6-digit number or letter | awCT74 or e3@09! |
[!NOTE]
The complete list of environment variables can be found in the 📘 Environment Variables
@lobehub/ui | lobehub/lobe-ui | Lobe UI is an open-source UI component library dedicated to building AIGC web applications. | |
@lobehub/lint | lobehub/lobe-lint | LobeLint provides configurations for ESlint, Stylelint, Commitlint, Prettier, Remark, and Semantic Release for LobeHub. | |
@lobehub/assets | lobehub/assets | Logo assets, favicons, webfonts for LobeHub. |
Plugins provide a means to extend the Function Calling capabilities of LobeChat. They can be used to introduce new function calls and even new ways to render message results. If you are interested in plugin development, please refer to our 📘 Plugin Development Guide in the Wiki.
Official Plugin Repository Description[!NOTE]
The plugin system is currently undergoing major development. You can learn more in the following issues:
- [x] Plugin Phase 1: Implement separation of the plugin from the main body, split the plugin into an independent repository for maintenance, and realize dynamic loading of the plugin.
- [x] Plugin Phase 2: The security and stability of the plugin's use, more accurately presenting abnormal states, the maintainability of the plugin architecture, and developer-friendly.
- [ ] Plugin Phase 3: Higher-level and more comprehensive customization capabilities, support for plugin authentication, and examples.
Clock Time By LobeHub on 2023-11-01 |
lobehub/chat-plugin-clock-time | Display a clock to show current timeclock time |
Website Crawler By LobeHub on 2023-08-17 |
lobehub/chat-plugin-web-crawler | Extract content from web linksweb content-crawler |
Search Engine By LobeHub on 2023-08-15 |
lobehub/chat-plugin-search-engine | Query search engine to get informationweb search |
Realtime Weather By LobeHub on 2023-08-12 |
lobehub/chat-plugin-realtime-weather | Get realtime weather informationweather realtime |
📊 Total plugins: 4
You can use GitHub Codespaces for online development:
Or clone it for local development:
$ git clone https://github.com/lobehub/lobe-chat.git
$ cd lobe-chat
$ bun install
$ bun dev
Contributions of all types are more than welcome; if you are interested in contributing code, feel free to check out our GitHub Issues and Projects to get stuck in to show us what you’re made of.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
To run the model you need Python 3.7+
If you don't have PyTorch installed. Follow their instructions here.
Install the package pix2tex
:
pip install "pix2tex[gui]"
Model checkpoints will be downloaded automatically.
There are three ways to get a prediction from an image.
You can use the command line tool by calling pix2tex
. Here you can parse already existing images from the disk and images in your clipboard.
Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr
. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.
Under linux, it is possible to use the GUI with gnome-screenshot
(which comes with multiple monitor support) if gnome-screenshot
was installed beforehand. For Wayland, grim
and slurp
will be used when they are both available. Note that gnome-screenshot
is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot
will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL
to grim
in this case (other available values are gnome-screenshot
and pil
).
If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature
parameter you can control this behavior (low temperature will produce the same result).
You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]"
and run
python -m pix2tex.api.run
to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex
docker pull lukasblecher/pix2tex:api
docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
To also run the streamlit demo run
docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
and navigate to http://localhost:8501/
Use from within Python
from PIL import Image
from pix2tex.cli import LatexOCR
img = Image.open('path/to/image.png')
model = LatexOCR()
print(model(img))
The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.
Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.
Want to use the package?
I'm trying to compile a documentation right now.
Visit here: https://pix2tex.readthedocs.io/
Install a couple of dependencies pip install "pix2tex[train]"
.
python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl
To use your own tokenizer pass it via --tokenizer
(See below).
You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.
data
(and valdata
) entry in the config file to the newly generated .pkl
file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml
for a template.python -m pix2tex.train --config path_to_config_file
If you want to use your own data you might be interested in creating your own tokenizer with
python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json
Don't forget to update the path to the tokenizer in the config file and set num_tokens
to your vocabulary size.
The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.
0.88 | 0.10 | 0.60 |
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here
In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
setup.py
)Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math
Contributions of any kind are welcome.
Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool
[1] An Image is Worth 16x16 Words
[3] Image-to-Markup Generation with Coarse-to-Fine Attention
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
【 👨🏻💻 YouTube | 📮 Newsletter 】
Explain complex systems using visuals and simple terms.
Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.
Architecture styles define how different components of an application programming interface (API) interact with one another. As a result, they ensure efficiency, reliability, and ease of integration with other systems by providing a standard approach to designing and building APIs. Here are the most used styles:
SOAP:
Mature, comprehensive, XML-based
Best for enterprise applications
RESTful:
Popular, easy-to-implement, HTTP methods
Ideal for web services
GraphQL:
Query language, request specific data
Reduces network overhead, faster responses
gRPC:
Modern, high-performance, Protocol Buffers
Suitable for microservices architectures
WebSocket:
Real-time, bidirectional, persistent connections
Perfect for low-latency data exchange
Webhook:
Event-driven, HTTP callbacks, asynchronous
Notifies systems when events occur
When it comes to API design, REST and GraphQL each have their own strengths and weaknesses.
The diagram below shows a quick comparison between REST and GraphQL.
REST
GraphQL
The best choice between REST and GraphQL depends on the specific requirements of the application and development team. GraphQL is a good fit for complex or frequently changing frontend needs, while REST suits applications where simple and consistent contracts are preferred.
Neither API approach is a silver bullet. Carefully evaluating requirements and tradeoffs is important to pick the right style. Both REST and GraphQL are valid options for exposing data and powering modern applications.
RPC (Remote Procedure Call) is called “remote” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call.
The diagram below illustrates the overall data flow for gRPC.
Step 1: A REST call is made from the client. The request body is usually in JSON format.
Steps 2 - 4: The order service (gRPC client) receives the REST call, transforms it, and makes an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the low-level transport layer.
Step 5: gRPC sends the packets over the network via HTTP2. Because of binary encoding and network optimizations, gRPC is said to be 5X faster than JSON.
Steps 6 - 8: The payment service (gRPC server) receives the packets from the network, decodes them, and invokes the server application.
Steps 9 - 11: The result is returned from the server application, and gets encoded and sent to the transport layer.
Steps 12 - 14: The order service receives the packets, decodes them, and sends the result to the client application.
The diagram below shows a comparison between polling and Webhook.
Assume we run an eCommerce website. The clients send orders to the order service via the API gateway, which goes to the payment service for payment transactions. The payment service then talks to an external payment service provider (PSP) to complete the transactions.
There are two ways to handle communications with the external PSP.
1. Short polling
After sending the payment request to the PSP, the payment service keeps asking the PSP about the payment status. After several rounds, the PSP finally returns with the status.
Short polling has two drawbacks:
2. Webhook
We can register a webhook with the external service. It means: call me back at a certain URL when you have updates on the request. When the PSP has completed the processing, it will invoke the HTTP request to update the payment status.
In this way, the programming paradigm is changed, and the payment service doesn’t need to waste resources to poll the payment status anymore.
What if the PSP never calls back? We can set up a housekeeping job to check payment status every hour.
Webhooks are often referred to as reverse APIs or push APIs because the server sends HTTP requests to the client. We need to pay attention to 3 things when using a webhook:
The diagram below shows 5 common tricks to improve API performance.
Pagination
This is a common optimization when the size of the result is large. The results are streaming back to the client to improve the service responsiveness.
Asynchronous Logging
Synchronous logging deals with the disk for every call and can slow down the system. Asynchronous logging sends logs to a lock-free buffer first and immediately returns. The logs will be flushed to the disk periodically. This significantly reduces the I/O overhead.
Caching
We can cache frequently accessed data into a cache. The client can query the cache first instead of visiting the database directly. If there is a cache miss, the client can query from the database. Caches like Redis store data in memory, so the data access is much faster than the database.
Payload Compression
The requests and responses can be compressed using gzip etc so that the transmitted data size is much smaller. This speeds up the upload and download.
Connection Pool
When accessing resources, we often need to load data from the database. Opening the closing db connections adds significant overhead. So we should connect to the db via a pool of open connections. The connection pool is responsible for managing the connection lifecycle.
What problem does each generation of HTTP solve?
The diagram below illustrates the key features.
HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.
HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesn’t solve the HOL (head-of-line) blocking issue.
HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.
HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.
As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP “streams”: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesn’t need to be sent in order.
HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.
QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.
The diagram below illustrates the API timeline and API styles comparison.
Over time, different API architectural styles are released. Each of them has its own patterns of standardizing data exchange.
You can check out the use cases of each style in the diagram.
The diagram below shows the differences between code-first development and API-first development. Why do we want to consider API first design?
It is better to think through the system's complexity before writing the code and carefully defining the boundaries of the services.
We can mock requests and responses to validate the API design before writing code.
Developers are happy about the process as well because they can focus on functional development instead of negotiating sudden changes.
The possibility of having surprises toward the end of the project lifecycle is reduced.
Because we have designed the API first, the tests can be designed while the code is being developed. In a way, we also have TDD (Test Driven Design) when using API first development.
The response codes for HTTP are divided into five categories:
Informational (100-199) Success (200-299) Redirection (300-399) Client Error (400-499) Server Error (500-599)
The diagram below shows the details.
Step 1 - The client sends an HTTP request to the API gateway.
Step 2 - The API gateway parses and validates the attributes in the HTTP request.
Step 3 - The API gateway performs allow-list/deny-list checks.
Step 4 - The API gateway talks to an identity provider for authentication and authorization.
Step 5 - The rate limiting rules are applied to the request. If it is over the limit, the request is rejected.
Steps 6 and 7 - Now that the request has passed basic checks, the API gateway finds the relevant service to route to by path matching.
Step 8 - The API gateway transforms the request into the appropriate protocol and sends it to backend microservices.
Steps 9-12: The API gateway can handle errors properly, and deals with faults if the error takes a longer time to recover (circuit break). It can also leverage ELK (Elastic-Logstash-Kibana) stack for logging and monitoring. We sometimes cache data in the API gateway.
The diagram below shows typical API designs with a shopping cart example.
Note that API design is not just URL path design. Most of the time, we need to choose the proper resource names, identifiers, and path patterns. It is equally important to design proper HTTP header fields or to design effective rate-limiting rules within the API gateway.
How is data sent over the network? Why do we need so many layers in the OSI model?
The diagram below shows how data is encapsulated and de-encapsulated when transmitting over the network.
Step 1: When Device A sends data to Device B over the network via the HTTP protocol, it is first added an HTTP header at the application layer.
Step 2: Then a TCP or a UDP header is added to the data. It is encapsulated into TCP segments at the transport layer. The header contains the source port, destination port, and sequence number.
Step 3: The segments are then encapsulated with an IP header at the network layer. The IP header contains the source/destination IP addresses.
Step 4: The IP datagram is added a MAC header at the data link layer, with source/destination MAC addresses.
Step 5: The encapsulated frames are sent to the physical layer and sent over the network in binary bits.
Steps 6-10: When Device B receives the bits from the network, it performs the de-encapsulation process, which is a reverse processing of the encapsulation process. The headers are removed layer by layer, and eventually, Device B can read the data.
We need layers in the network model because each layer focuses on its own responsibilities. Each layer can rely on the headers for processing instructions and does not need to know the meaning of the data from the last layer.
The diagram below shows the differences between a 𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐩𝐫𝐨𝐱𝐲 and a 𝐫𝐞𝐯𝐞𝐫𝐬𝐞 𝐩𝐫𝐨𝐱𝐲.
A forward proxy is a server that sits between user devices and the internet.
A forward proxy is commonly used for:
A reverse proxy is a server that accepts a request from the client, forwards the request to web servers, and returns the results to the client as if the proxy server had processed the request.
A reverse proxy is good for:
The diagram below shows 6 common algorithms.
Round robin
The client requests are sent to different service instances in sequential order. The services are usually required to be stateless.
Sticky round-robin
This is an improvement of the round-robin algorithm. If Alice’s first request goes to service A, the following requests go to service A as well.
Weighted round-robin
The admin can specify the weight for each service. The ones with a higher weight handle more requests than others.
Hash
This algorithm applies a hash function on the incoming requests’ IP or URL. The requests are routed to relevant instances based on the hash function result.
Least connections
A new request is sent to the service instance with the least concurrent connections.
Least response time
A new request is sent to the service instance with the fastest response time.
The diagram below shows a comparison of URL, URI, and URN.
URI stands for Uniform Resource Identifier. It identifies a logical or physical resource on the web. URL and URN are subtypes of URI. URL locates a resource, while URN names a resource.
A URI is composed of the following parts: scheme:[//authority]path[?query][#fragment]
URL stands for Uniform Resource Locator, the key concept of HTTP. It is the address of a unique resource on the web. It can be used with other protocols like FTP and JDBC.
URN stands for Uniform Resource Name. It uses the urn scheme. URNs cannot be used to locate a resource. A simple example given in the diagram is composed of a namespace and a namespace-specific string.
If you would like to learn more detail on the subject, I would recommend W3C’s clarification.
Section 1 - SDLC with CI/CD
The software development life cycle (SDLC) consists of several key stages: development, testing, deployment, and maintenance. CI/CD automates and integrates these stages to enable faster and more reliable releases.
When code is pushed to a git repository, it triggers an automated build and test process. End-to-end (e2e) test cases are run to validate the code. If tests pass, the code can be automatically deployed to staging/production. If issues are found, the code is sent back to development for bug fixing. This automation provides fast feedback to developers and reduces the risk of bugs in production.
Section 2 - Difference between CI and CD
Continuous Integration (CI) automates the build, test, and merge process. It runs tests whenever code is committed to detect integration issues early. This encourages frequent code commits and rapid feedback.
Continuous Delivery (CD) automates release processes like infrastructure changes and deployment. It ensures software can be released reliably at any time through automated workflows. CD may also automate the manual testing and approval steps required before production deployment.
Section 3 - CI/CD Pipeline
A typical CI/CD pipeline has several connected stages:
Planning: Netflix Engineering uses JIRA for planning and Confluence for documentation.
Coding: Java is the primary programming language for the backend service, while other languages are used for different use cases.
Build: Gradle is mainly used for building, and Gradle plugins are built to support various use cases.
Packaging: Package and dependencies are packed into an Amazon Machine Image (AMI) for release.
Testing: Testing emphasizes the production culture's focus on building chaos tools.
Deployment: Netflix uses its self-built Spinnaker for canary rollout deployment.
Monitoring: The monitoring metrics are centralized in Atlas, and Kayenta is used to detect anomalies.
Incident report: Incidents are dispatched according to priority, and PagerDuty is used for incident handling.
These architecture patterns are among the most commonly used in app development, whether on iOS or Android platforms. Developers have introduced them to overcome the limitations of earlier patterns. So, how do they differ?
Patterns are reusable solutions to common design problems, resulting in a smoother, more efficient development process. They serve as blueprints for building better software structures. These are some of the most popular patterns:
Choosing the right database for your project is a complex task. Many database options, each suited to distinct use cases, can quickly lead to decision fatigue.
We hope this cheat sheet provides high-level direction to pinpoint the right service that aligns with your project's needs and avoid potential pitfalls.
Note: Google has limited documentation for their database use cases. Even though we did our best to look at what was available and arrived at the best option, some of the entries may need to be more accurate.
The answer will vary depending on your use case. Data can be indexed in memory or on disk. Similarly, data formats vary, such as numbers, strings, geographic coordinates, etc. The system might be write-heavy or read-heavy. All of these factors affect your choice of database index format.
The following are some of the most popular data structures used for indexing data:
The diagram below shows the process. Note that the architectures for different databases are different, the diagram demonstrates some common designs.
Step 1 - A SQL statement is sent to the database via a transport layer protocol (e.g.TCP).
Step 2 - The SQL statement is sent to the command parser, where it goes through syntactic and semantic analysis, and a query tree is generated afterward.
Step 3 - The query tree is sent to the optimizer. The optimizer creates an execution plan.
Step 4 - The execution plan is sent to the executor. The executor retrieves data from the execution.
Step 5 - Access methods provide the data fetching logic required for execution, retrieving data from the storage engine.
Step 6 - Access methods decide whether the SQL statement is read-only. If the query is read-only (SELECT statement), it is passed to the buffer manager for further processing. The buffer manager looks for the data in the cache or data files.
Step 7 - If the statement is an UPDATE or INSERT, it is passed to the transaction manager for further processing.
Step 8 - During a transaction, the data is in lock mode. This is guaranteed by the lock manager. It also ensures the transaction’s ACID properties.
The CAP theorem is one of the most famous terms in computer science, but I bet different developers have different understandings. Let’s examine what it is and why it can be confusing.
CAP theorem states that a distributed system can't provide more than two of these three guarantees simultaneously.
Consistency: consistency means all clients see the same data at the same time no matter which node they connect to.
Availability: availability means any client that requests data gets a response even if some of the nodes are down.
Partition Tolerance: a partition indicates a communication break between two nodes. Partition tolerance means the system continues to operate despite network partitions.
The “2 of 3” formulation can be useful, but this simplification could be misleading.
Picking a database is not easy. Justifying our choice purely based on the CAP theorem is not enough. For example, companies don't choose Cassandra for chat applications simply because it is an AP system. There is a list of good characteristics that make Cassandra a desirable option for storing chat messages. We need to dig deeper.
“CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare”. Quoted from the paper: CAP Twelve Years Later: How the “Rules” Have Changed.
The theorem is about 100% availability and consistency. A more realistic discussion would be the trade-offs between latency and consistency when there is no network partition. See PACELC theorem for more details.
Is the CAP theorem actually useful?
I think it is still useful as it opens our minds to a set of tradeoff discussions, but it is only part of the story. We need to dig deeper when picking the right database.
SQL statements are executed by the database system in several steps, including:
The execution of SQL is highly complex and involves many considerations, such as:
In 1986, SQL (Structured Query Language) became a standard. Over the next 40 years, it became the dominant language for relational database management systems. Reading the latest standard (ANSI SQL 2016) can be time-consuming. How can I learn it?
There are 5 components of the SQL language:
For a backend engineer, you may need to know most of it. As a data analyst, you may need to have a good understanding of DQL. Select the topics that are most relevant to you.
This diagram illustrates where we cache data in a typical architecture.
There are multiple layers along the flow.
There are 3 main reasons as shown in the diagram below.
Question: Another popular in-memory store is Memcached. Do you know the differences between Redis and Memcached?
You might have noticed the style of this diagram is different from my previous posts. Please let me know which one you prefer.
There is more to Redis than just caching.
Redis can be used in a variety of scenarios as shown in the diagram.
Session
We can use Redis to share user session data among different services.
Cache
We can use Redis to cache objects or pages, especially for hotspot data.
Distributed lock
We can use a Redis string to acquire locks among distributed services.
Counter
We can count how many likes or how many reads for articles.
Rate limiter
We can apply a rate limiter for certain user IPs.
Global ID generator
We can use Redis Int for global ID.
Shopping cart
We can use Redis Hash to represent key-value pairs in a shopping cart.
Calculate user retention
We can use Bitmap to represent the user login daily and calculate user retention.
Message queue
We can use List for a message queue.
Ranking
We can use ZSet to sort the articles.
Designing large-scale systems usually requires careful consideration of caching. Below are five caching strategies that are frequently utilized.
The diagram below shows a typical microservice architecture.
Benefits of microservices:
A picture is worth a thousand words: 9 best practices for developing microservices.
When we develop microservices, we need to follow the following best practices:
Below you will find a diagram showing the microservice tech stack, both for the development phase and for production.
▶️ 𝐏𝐫𝐞-𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧
▶️ 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧
There are many design decisions that contributed to Kafka’s performance. In this post, we’ll focus on two. We think these two carried the most weight.
The diagram illustrates how the data is transmitted between producer and consumer, and what zero-copy means.
2.1 The data is loaded from disk to OS cache
2.2 The data is copied from OS cache to Kafka application
2.3 Kafka application copies the data into the socket buffer
2.4 The data is copied from socket buffer to network card
2.5 The network card sends data out to the consumer
3.1: The data is loaded from disk to OS cache 3.2 OS cache directly copies the data to the network card via sendfile() command 3.3 The network card sends data out to the consumer
Zero copy is a shortcut to save the multiple data copies between application context and kernel context.
The diagram below shows the economics of the credit card payment flow.
1. The cardholder pays a merchant $100 to buy a product.
2. The merchant benefits from the use of the credit card with higher sales volume and needs to compensate the issuer and the card network for providing the payment service. The acquiring bank sets a fee with the merchant, called the “merchant discount fee.”
3 - 4. The acquiring bank keeps $0.25 as the acquiring markup, and $1.75 is paid to the issuing bank as the interchange fee. The merchant discount fee should cover the interchange fee.
The interchange fee is set by the card network because it is less efficient for each issuing bank to negotiate fees with each merchant.
5. The card network sets up the network assessments and fees with each bank, which pays the card network for its services every month. For example, VISA charges a 0.11% assessment, plus a $0.0195 usage fee, for every swipe.
6. The cardholder pays the issuing bank for its services.
Why should the issuing bank be compensated?
VISA, Mastercard, and American Express act as card networks for the clearing and settling of funds. The card acquiring bank and the card issuing bank can be – and often are – different. If banks were to settle transactions one by one without an intermediary, each bank would have to settle the transactions with all the other banks. This is quite inefficient.
The diagram below shows VISA’s role in the credit card payment process. There are two flows involved. Authorization flow happens when the customer swipes the credit card. Capture and settlement flow happens when the merchant wants to get the money at the end of the day.
Step 0: The card issuing bank issues credit cards to its customers.
Step 1: The cardholder wants to buy a product and swipes the credit card at the Point of Sale (POS) terminal in the merchant’s shop.
Step 2: The POS terminal sends the transaction to the acquiring bank, which has provided the POS terminal.
Steps 3 and 4: The acquiring bank sends the transaction to the card network, also called the card scheme. The card network sends the transaction to the issuing bank for approval.
Steps 4.1, 4.2 and 4.3: The issuing bank freezes the money if the transaction is approved. The approval or rejection is sent back to the acquirer, as well as the POS terminal.
Steps 1 and 2: The merchant wants to collect the money at the end of the day, so they hit ”capture” on the POS terminal. The transactions are sent to the acquirer in batch. The acquirer sends the batch file with transactions to the card network.
Step 3: The card network performs clearing for the transactions collected from different acquirers, and sends the clearing files to different issuing banks.
Step 4: The issuing banks confirm the correctness of the clearing files, and transfer money to the relevant acquiring banks.
Step 5: The acquiring bank then transfers money to the merchant’s bank.
Step 4: The card network clears up the transactions from different acquiring banks. Clearing is a process in which mutual offset transactions are netted, so the number of total transactions is reduced.
In the process, the card network takes on the burden of talking to each bank and receives service fees in return.
What’s UPI? UPI is an instant real-time payment system developed by the National Payments Corporation of India.
It accounts for 60% of digital retail transactions in India today.
UPI = payment markup language + standard for interoperable payments
The concepts of DevOps, SRE, and Platform Engineering have emerged at different times and have been developed by various individuals and organizations.
DevOps as a concept was introduced in 2009 by Patrick Debois and Andrew Shafer at the Agile conference. They sought to bridge the gap between software development and operations by promoting a collaborative culture and shared responsibility for the entire software development lifecycle.
SRE, or Site Reliability Engineering, was pioneered by Google in the early 2000s to address operational challenges in managing large-scale, complex systems. Google developed SRE practices and tools, such as the Borg cluster management system and the Monarch monitoring system, to improve the reliability and efficiency of their services.
Platform Engineering is a more recent concept, building on the foundation of SRE engineering. The precise origins of Platform Engineering are less clear, but it is generally understood to be an extension of the DevOps and SRE practices, with a focus on delivering a comprehensive platform for product development that supports the entire business perspective.
It's worth noting that while these concepts emerged at different times. They are all related to the broader trend of improving collaboration, automation, and efficiency in software development and operations.
K8s is a container orchestration system. It is used for container deployment and management. Its design is greatly impacted by Google’s internal system Borg.
A k8s cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.
The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers, and a cluster usually runs multiple nodes, providing fault tolerance and high availability.
API Server
The API server talks to all the components in the k8s cluster. All the operations on pods are executed by talking to the API server.
Scheduler
The scheduler watches pod workloads and assigns loads on newly created pods.
Controller Manager
The controller manager runs the controllers, including Node Controller, Job Controller, EndpointSlice Controller, and ServiceAccount Controller.
Etcd
etcd is a key-value store used as Kubernetes' backing store for all cluster data.
Pods
A pod is a group of containers and is the smallest unit that k8s administers. Pods have a single IP address applied to every container within the pod.
Kubelet
An agent that runs on each node in the cluster. It ensures containers are running in a Pod.
Kube Proxy
Kube-proxy is a network proxy that runs on each node in your cluster. It routes traffic coming into a node from the service. It forwards requests for work to the correct containers.
What is Docker ?
Docker is an open-source platform that allows you to package, distribute, and run applications in isolated containers. It focuses on containerization, providing lightweight environments that encapsulate applications and their dependencies.
What is Kubernetes ?
Kubernetes, often referred to as K8s, is an open-source container orchestration platform. It provides a framework for automating the deployment, scaling, and management of containerized applications across a cluster of nodes.
How are both different from each other ?
Docker: Docker operates at the individual container level on a single operating system host.
You must manually manage each host and setting up networks, security policies, and storage for multiple related containers can be complex.
Kubernetes: Kubernetes operates at the cluster level. It manages multiple containerized applications across multiple hosts, providing automation for tasks like load balancing, scaling, and ensuring the desired state of applications.
In short, Docker focuses on containerization and running containers on individual hosts, while Kubernetes specializes in managing and orchestrating containers at scale across a cluster of hosts.
The diagram below shows the architecture of Docker and how it works when we run “docker build”, “docker pull” and “docker run”.
There are 3 components in Docker architecture:
Docker client
The docker client talks to the Docker daemon.
Docker host
The Docker daemon listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes.
Docker registry
A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use.
Let’s take the “docker run” command as an example.
To begin with, it's essential to identify where our code is stored. The common assumption is that there are only two locations - one on a remote server like Github and the other on our local machine. However, this isn't entirely accurate. Git maintains three local storages on our machine, which means that our code can be found in four places:
Most Git commands primarily move files between these four locations.
The diagram below shows the Git workflow.
Git is a distributed version control system.
Every developer maintains a local copy of the main repository and edits and commits to the local copy.
The commit is very fast because the operation doesn’t interact with the remote repository.
If the remote repository crashes, the files can be recovered from the local repositories.
What are the differences?
When we merge changes from one Git branch to another, we can use ‘git merge’ or ‘git rebase’. The diagram below shows how the two commands work.
Git merge
This creates a new commit G’ in the main branch. G’ ties the histories of both main and feature branches.
Git merge is non-destructive. Neither the main nor the feature branch is changed.
Git rebase
Git rebase moves the feature branch histories to the head of the main branch. It creates new commits E’, F’, and G’ for each commit in the feature branch.
The benefit of rebase is that it has a linear commit history.
Rebase can be dangerous if “the golden rule of git rebase” is not followed.
The Golden Rule of Git Rebase
Never use it on public branches!
Below is a diagram showing the evolution of architecture and processes since the 1980s.
Organizations can build and run scalable applications on public, private, and hybrid clouds using cloud native technologies.
This means the applications are designed to leverage cloud features, so they are resilient to load and easy to scale.
Cloud native includes 4 aspects:
Development process
This has progressed from waterfall to agile to DevOps.
Application Architecture
The architecture has gone from monolithic to microservices. Each service is designed to be small, adaptive to the limited resources in cloud containers.
Deployment & packaging
The applications used to be deployed on physical servers. Then around 2000, the applications that were not sensitive to latency were usually deployed on virtual servers. With cloud native applications, they are packaged into docker images and deployed in containers.
Application infrastructure
The applications are massively deployed on cloud infrastructure instead of self-hosted servers.
Nested JSON files are hard to read.
JsonCrack generates graph diagrams from JSON files and makes them easy to read.
Additionally, the generated diagrams can be downloaded as images.
What does it do?
The Linux file system used to resemble an unorganized town where individuals constructed their houses wherever they pleased. However, in 1994, the Filesystem Hierarchy Standard (FHS) was introduced to bring order to the Linux file system.
By implementing a standard like the FHS, software can ensure a consistent layout across various Linux distributions. Nonetheless, not all Linux distributions strictly adhere to this standard. They often incorporate their own unique elements or cater to specific requirements. To become proficient in this standard, you can begin by exploring. Utilize commands such as "cd" for navigation and "ls" for listing directory contents. Imagine the file system as a tree, starting from the root (/). With time, it will become second nature to you, transforming you into a skilled Linux administrator.
Linux commands are instructions for interacting with the operating system. They help manage files, directories, system processes, and many other aspects of the system. You need to become familiar with these commands in order to navigate and maintain Linux-based systems efficiently and effectively.
This diagram below shows popular Linux commands:
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP.) HTTPS transmits encrypted data using Transport Layer Security (TLS.) If the data is hijacked online, all the hijacker gets is binary code.
How is the data encrypted and decrypted?
Step 1 - The client (browser) and the server establish a TCP connection.
Step 2 - The client sends a “client hello” to the server. The message contains a set of necessary encryption algorithms (cipher suites) and the latest TLS version it can support. The server responds with a “server hello” so the browser knows whether it can support the algorithms and TLS version.
The server then sends the SSL certificate to the client. The certificate contains the public key, host name, expiry dates, etc. The client validates the certificate.
Step 3 - After validating the SSL certificate, the client generates a session key and encrypts it using the public key. The server receives the encrypted session key and decrypts it with the private key.
Step 4 - Now that both the client and the server hold the same session key (symmetric encryption), the encrypted data is transmitted in a secure bi-directional channel.
Why does HTTPS switch to symmetric encryption during data transmission? There are two main reasons:
Security: The asymmetric encryption goes only one way. This means that if the server tries to send the encrypted data back to the client, anyone can decrypt the data using the public key.
Server resources: The asymmetric encryption adds quite a lot of mathematical overhead. It is not suitable for data transmissions in long sessions.
OAuth 2.0 is a powerful and secure framework that allows different applications to securely interact with each other on behalf of users without sharing sensitive credentials.
The entities involved in OAuth are the User, the Server, and the Identity Provider (IDP).
What Can an OAuth Token Do?
When you use OAuth, you get an OAuth token that represents your identity and permissions. This token can do a few important things:
Single Sign-On (SSO): With an OAuth token, you can log into multiple services or apps using just one login, making life easier and safer.
Authorization Across Systems: The OAuth token allows you to share your authorization or access rights across various systems, so you don't have to log in separately everywhere.
Accessing User Profile: Apps with an OAuth token can access certain parts of your user profile that you allow, but they won't see everything.
Remember, OAuth 2.0 is all about keeping you and your data safe while making your online experiences seamless and hassle-free across different applications and services.
SSH Keys:
Cryptographic keys are used to access remote systems and servers securely
OAuth Tokens:
Tokens that provide limited access to user data on third-party applications
SSL Certificates:
Digital certificates ensure secure and encrypted communication between servers and clients
Credentials:
User authentication information is used to verify and grant access to various systems and services
These terms are all related to user identity management. When you log into a website, you declare who you are (identification). Your identity is verified (authentication), and you are granted the necessary permissions (authorization). Many solutions have been proposed in the past, and the list keeps growing.
From simple to complex, here is my understanding of user identity management:
WWW-Authenticate is the most basic method. You are asked for the username and password by the browser. As a result of the inability to control the login life cycle, it is seldom used today.
A finer control over the login life cycle is session-cookie. The server maintains session storage, and the browser keeps the ID of the session. A cookie usually only works with browsers and is not mobile app friendly.
To address the compatibility issue, the token can be used. The client sends the token to the server, and the server validates the token. The downside is that the token needs to be encrypted and decrypted, which may be time-consuming.
JWT is a standard way of representing tokens. This information can be verified and trusted because it is digitally signed. Since JWT contains the signature, there is no need to save session information on the server side.
By using SSO (single sign-on), you can sign on only once and log in to multiple websites. It uses CAS (central authentication service) to maintain cross-site information.
By using OAuth 2.0, you can authorize one website to access your information on another website.
Things NOT to do
Storing passwords in plain text is not a good idea because anyone with internal access can see them.
Storing password hashes directly is not sufficient because it is pruned to precomputation attacks, such as rainbow tables.
To mitigate precomputation attacks, we salt the passwords.
What is salt?
According to OWASP guidelines, “a salt is a unique, randomly generated string that is added to each password as part of the hashing process”.
How to store a password and salt?
How to validate a password?
To validate a password, it can go through the following process:
Imagine you have a special box called a JWT. Inside this box, there are three parts: a header, a payload, and a signature.
The header is like the label on the outside of the box. It tells us what type of box it is and how it's secured. It's usually written in a format called JSON, which is just a way to organize information using curly braces { } and colons : .
The payload is like the actual message or information you want to send. It could be your name, age, or any other data you want to share. It's also written in JSON format, so it's easy to understand and work with. Now, the signature is what makes the JWT secure. It's like a special seal that only the sender knows how to create. The signature is created using a secret code, kind of like a password. This signature ensures that nobody can tamper with the contents of the JWT without the sender knowing about it.
When you want to send the JWT to a server, you put the header, payload, and signature inside the box. Then you send it over to the server. The server can easily read the header and payload to understand who you are and what you want to do.
Google Authenticator is commonly used for logging into our accounts when 2-factor authentication is enabled. How does it guarantee security?
Google Authenticator is a software-based authenticator that implements a two-step verification service. The diagram below provides detail.
There are two stages involved:
Let’s look at these stages.
Stage 1
Steps 1 and 2: Bob opens the web page to enable two-step verification. The front end requests a secret key. The authentication service generates the secret key for Bob and stores it in the database.
Step 3: The authentication service returns a URI to the front end. The URI is composed of a key issuer, username, and secret key. The URI is displayed in the form of a QR code on the web page.
Step 4: Bob then uses Google Authenticator to scan the generated QR code. The secret key is stored in the authenticator.
Stage 2 Steps 1 and 2: Bob wants to log into a website with Google two-step verification. For this, he needs the password. Every 30 seconds, Google Authenticator generates a 6-digit password using TOTP (Time-based One Time Password) algorithm. Bob uses the password to enter the website.
Steps 3 and 4: The frontend sends the password Bob enters to the backend for authentication. The authentication service reads the secret key from the database and generates a 6-digit password using the same TOTP algorithm as the client.
Step 5: The authentication service compares the two passwords generated by the client and the server, and returns the comparison result to the frontend. Bob can proceed with the login process only if the two passwords match.
Is this authentication mechanism safe?
Can the secret key be obtained by others?
We need to make sure the secret key is transmitted using HTTPS. The authenticator client and the database store the secret key, and we need to make sure the secret keys are encrypted.
Can the 6-digit password be guessed by hackers?
No. The password has 6 digits, so the generated password has 1 million potential combinations. Plus, the password changes every 30 seconds. If hackers want to guess the password in 30 seconds, they need to enter 30,000 combinations per second.
This post is based on research from many Netflix engineering blogs and open-source projects. If you come across any inaccuracies, please feel free to inform us.
Mobile and web: Netflix has adopted Swift and Kotlin to build native mobile apps. For its web application, it uses React.
Frontend/server communication: Netflix uses GraphQL.
Backend services: Netflix relies on ZUUL, Eureka, the Spring Boot framework, and other technologies.
Databases: Netflix utilizes EV cache, Cassandra, CockroachDB, and other databases.
Messaging/streaming: Netflix employs Apache Kafka and Fink for messaging and streaming purposes.
Video storage: Netflix uses S3 and Open Connect for video storage.
Data processing: Netflix utilizes Flink and Spark for data processing, which is then visualized using Tableau. Redshift is used for processing structured data warehouse information.
CI/CD: Netflix employs various tools such as JIRA, Confluence, PagerDuty, Jenkins, Gradle, Chaos Monkey, Spinnaker, Atlas, and more for CI/CD processes.
Yes, this is the real Twitter architecture. It is posted by Elon Musk and redrawn by us for better readability.
Airbnb’s microservice architecture went through 3 main stages.
Monolith (2008 - 2017)
Airbnb began as a simple marketplace for hosts and guests. This is built in a Ruby on Rails application - the monolith.
What’s the challenge?
Microservices (2017 - 2020)
Microservice aims to solve those challenges. In the microservice architecture, key services include:
What’s the challenge?
Hundreds of services and dependencies were difficult for humans to manage.
Micro + macroservices (2020 - present)
This is what Airbnb is working on now. The micro and macroservice hybrid model focuses on the unification of APIs.
Which is the best? Why do different companies choose different options?
Monorepo isn't new; Linux and Windows were both created using Monorepo. To improve scalability and build speed, Google developed its internal dedicated toolchain to scale it faster and strict coding quality standards to keep it consistent.
Amazon and Netflix are major ambassadors of the Microservice philosophy. This approach naturally separates the service code into separate repositories. It scales faster but can lead to governance pain points later on.
Within Monorepo, each service is a folder, and every folder has a BUILD config and OWNERS permission control. Every service member is responsible for their own folder.
On the other hand, in Microrepo, each service is responsible for its repository, with the build config and permissions typically set for the entire repository.
In Monorepo, dependencies are shared across the entire codebase regardless of your business, so when there's a version upgrade, every codebase upgrades their version.
In Microrepo, dependencies are controlled within each repository. Businesses choose when to upgrade their versions based on their own schedules.
Monorepo has a standard for check-ins. Google's code review process is famously known for setting a high bar, ensuring a coherent quality standard for Monorepo, regardless of the business.
Microrepo can either set its own standard or adopt a shared standard by incorporating the best practices. It can scale faster for business, but the code quality might be a bit different. Google engineers built Bazel, and Meta built Buck. There are other open-source tools available, including Nix, Lerna, and others.
Over the years, Microrepo has had more supported tools, including Maven and Gradle for Java, NPM for NodeJS, and CMake for C/C++, among others.
If your answer is on-premise servers and monolith (on the bottom of the following image), you would likely fail the interview, but that's how it is built in reality!
What people think it should look like
The interviewer is probably expecting something like the top portion of the picture.
What it actually is
Stack Overflow serves all the traffic with only 9 on-premise web servers, and it’s on monolith! It has its own servers and does not run on the cloud.
This is contrary to all our popular beliefs these days.
The diagram below shows the architecture comparison before and after the migration.
What is Amazon Prime Video Monitoring Service?
Prime Video service needs to monitor the quality of thousands of live streams. The monitoring tool automatically analyzes the streams in real time and identifies quality issues like block corruption, video freeze, and sync problems. This is an important process for customer satisfaction.
There are 3 steps: media converter, defect detector, and real-time notification.
What is the problem with the old architecture?
The old architecture was based on Amazon Lambda, which was good for building services quickly. However, it was not cost-effective when running the architecture at a high scale. The two most expensive operations are:
The orchestration workflow - AWS step functions charge users by state transitions and the orchestration performs multiple state transitions every second.
Data passing between distributed components - the intermediate data is stored in Amazon S3 so that the next stage can download. The download can be costly when the volume is high.
Monolithic architecture saves 90% cost
A monolithic architecture is designed to address the cost issues. There are still 3 components, but the media converter and defect detector are deployed in the same process, saving the cost of passing data over the network. Surprisingly, this approach to deployment architecture change led to 90% cost savings!
This is an interesting and unique case study because microservices have become a go-to and fashionable choice in the tech industry. It's good to see that we are having more discussions about evolving the architecture and having more honest discussions about its pros and cons. Decomposing components into distributed microservices comes with a cost.
What did Amazon leaders say about this?
Amazon CTO Werner Vogels: “Building evolvable software systems is a strategy, not a religion. And revisiting your architecture with an open mind is a must.”
Ex Amazon VP Sustainability Adrian Cockcroft: “The Prime Video team had followed a path I call Serverless First…I don’t advocate Serverless Only”.
Clients send emojis through standard HTTP requests. You can think of Golang Service as a typical Web Server. Golang is chosen because it supports concurrency well. Threads in Golang are lightweight.
Since the write volume is very high, Kafka (message queue) is used as a buffer.
Emoji data are aggregated by a streaming processing service called Spark. It aggregates data every 2 seconds, which is configurable. There is a trade-off to be made based on the interval. A shorter interval means emojis are delivered to other clients faster but it also means more computing resources are needed.
Aggregated data is written to another Kafka.
The PubSub consumers pull aggregated emoji data from Kafka.
Emojis are delivered to other clients in real-time through the PubSub infrastructure. The PubSub infrastructure is interesting. Hotstar considered the following protocols: Socketio, NATS, MQTT, and gRPC, and settled with MQTT.
A similar design is adopted by LinkedIn which streams a million likes/sec.
The diagram below shows the evolution of message storage at Discord:
MongoDB ➡️ Cassandra ➡️ ScyllaDB
In 2015, the first version of Discord was built on top of a single MongoDB replica. Around Nov 2015, MongoDB stored 100 million messages and the RAM couldn’t hold the data and index any longer. The latency became unpredictable. Message storage needs to be moved to another database. Cassandra was chosen.
In 2017, Discord had 12 Cassandra nodes and stored billions of messages.
At the beginning of 2022, it had 177 nodes with trillions of messages. At this point, latency was unpredictable, and maintenance operations became too expensive to run.
There are several reasons for the issue:
ScyllaDB is Cassandra compatible database written in C++. Discord redesigned its architecture to have a monolithic API, a data service written in Rust, and ScyllaDB-based storage.
The p99 read latency in ScyllaDB is 15ms compared to 40-125ms in Cassandra. The p99 write latency is 5ms compared to 5-70ms in Cassandra.
Live streaming differs from regular streaming because the video content is sent via the internet in real-time, usually with a latency of just a few seconds.
The diagram below explains what happens behind the scenes to make this possible.
Step 1: The raw video data is captured by a microphone and camera. The data is sent to the server side.
Step 2: The video data is compressed and encoded. For example, the compressing algorithm separates the background and other video elements. After compression, the video is encoded to standards such as H.264. The size of the video data is much smaller after this step.
Step 3: The encoded data is divided into smaller segments, usually seconds in length, so it takes much less time to download or stream.
Step 4: The segmented data is sent to the streaming server. The streaming server needs to support different devices and network conditions. This is called ‘Adaptive Bitrate Streaming.’ This means we need to produce multiple files at different bitrates in steps 2 and 3.
Step 5: The live streaming data is pushed to edge servers supported by CDN (Content Delivery Network.) Millions of viewers can watch the video from an edge server nearby. CDN significantly lowers data transmission latency.
Step 6: The viewers’ devices decode and decompress the video data and play the video in a video player.
Steps 7 and 8: If the video needs to be stored for replay, the encoded data is sent to a storage server, and viewers can request a replay from it later.
Standard protocols for live streaming include:
This work is licensed under CC BY-NC-ND 4.0
Robust Speech Recognition via Large-Scale Weak Supervision
[Blog] [Paper] [Model card] [Colab example]
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:
pip install git+https://github.com/openai/whisper.git
To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust
installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install
command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH
environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH"
. If the installation fails with No module named 'setuptools_rust'
, you need to install setuptools_rust
, e.g. by running:
pip install setuptools-rust
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
Size Parameters English-only model Multilingual model Required VRAM Relative speedtiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
The .en
models for English-only applications tend to perform better, especially for the tiny.en
and base.en
models. We observed that the difference becomes less significant for the small.en
and medium.en
models.
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3
and large-v2
models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.
The following command will transcribe speech in audio files, using the medium
model:
whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small
model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language
option:
whisper japanese.wav --language Japanese
Adding --task translate
will translate the speech into English:
whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:
whisper --help
See tokenizer.py for the list of all available languages.
Transcription can also be performed within Python:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe()
method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of whisper.detect_language()
and whisper.decode()
which provide lower-level access to the model.
import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.
Whisper's code and model weights are released under the MIT License. See LICENSE for further details.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язык ∙ Español ∙ ภาษาไทย ∙ Türkçe ∙ tiếng Việt ∙ Français | Add Translation
Help translate this guide!
Learn how to design large-scale systems.
Prep for the system design interview.
Learning how to design scalable systems will help you become a better engineer.
System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.
This repo is an organized collection of resources to help you learn how to build systems at scale.
This is a continually updated, open source project.
Contributions are welcome!
In addition to coding interviews, system design is a required component of the technical interview process at many tech companies.
Practice common system design interview questions and compare your results with sample solutions: discussions, code, and diagrams.
Additional topics for interview prep:
The provided Anki flashcard decks use spaced repetition to help you retain key system design concepts.
Great for use while on-the-go.
Looking for resources to help you prep for the Coding Interview?
Check out the sister repo Interactive Coding Challenges, which contains an additional Anki deck:
Learn from the community.
Feel free to submit pull requests to help:
Content that needs some polishing is placed under development.
Review the Contributing Guidelines.
Summaries of various system design topics, including pros and cons. Everything is a trade-off.
Each section contains links to more in-depth resources.
Suggested topics to review based on your interview timeline (short, medium, long).
Q: For interviews, do I need to know everything here?
A: No, you don't need to know everything here to prepare for the interview.
What you are asked in an interview depends on variables such as:
More experienced candidates are generally expected to know more about system design. Architects or team leads might be expected to know more than individual contributors. Top tech companies are likely to have one or more design interview rounds.
Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the following guide based on your timeline, experience, what positions you are interviewing for, and which companies you are interviewing with.
Read through the System design topics to get a broad understanding of how systems work | 👍 | 👍 | 👍 |
Read through a few articles in the Company engineering blogs for the companies you are interviewing with | 👍 | 👍 | 👍 |
Read through a few Real world architectures | 👍 | 👍 | 👍 |
Review How to approach a system design interview question | 👍 | 👍 | 👍 |
Work through System design interview questions with solutions | Some | Many | Most |
Work through Object-oriented design interview questions with solutions | Some | Many | Most |
Review Additional system design interview questions | Some | Many | Most |
How to tackle a system design interview question.
The system design interview is an open-ended conversation. You are expected to lead it.
You can use the following steps to guide the discussion. To help solidify this process, work through the System design interview questions with solutions section using the following steps.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.
Outline a high level design with all important components.
Dive into details for each core component. For example, if you were asked to design a url shortening service, discuss:
Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability issues?
Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using principles of scalable system design.
You might be asked to do some estimates by hand. Refer to the Appendix for the following resources:
Check out the following links to get a better idea of what to expect:
QuestionCommon system design interview questions with sample discussions, code, and diagrams.
Solutions linked to content in the
solutions/
folder.
Design Pastebin.com (or Bit.ly) | Solution |
Design the Twitter timeline and search (or Facebook feed and search) | Solution |
Design a web crawler | Solution |
Design Mint.com | Solution |
Design the data structures for a social network | Solution |
Design a key-value store for a search engine | Solution |
Design Amazon's sales ranking by category feature | Solution |
Design a system that scales to millions of users on AWS | Solution |
Add a system design question | Contribute |
Common object-oriented design interview questions with sample discussions, code, and diagrams.
Solutions linked to content in the
solutions/
folder.
QuestionNote: This section is under development
Design a hash map | Solution |
Design a least recently used cache | Solution |
Design a call center | Solution |
Design a deck of cards | Solution |
Design a parking lot | Solution |
Design a chat server | Solution |
Design a circular array | Contribute |
Add an object-oriented design question | Contribute |
New to system design?
First, you'll need a basic understanding of common principles, learning about what they are, how they are used, and their pros and cons.
Scalability Lecture at Harvard
Next, we'll look at high-level trade-offs:
Keep in mind that everything is a trade-off.
Then we'll dive into more specific topics such as DNS, CDNs, and load balancers.
A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.1
Another way to look at performance vs scalability:
Latency is the time to perform some action or to produce some result.
Throughput is the number of such actions or results per unit of time.
Generally, you should aim for maximal throughput with acceptable latency.
In a distributed computer system, you can only support two of the following guarantees:
Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most recent write or an error.
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
After a write, reads will see it. Data is replicated synchronously.
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
There are two complementary patterns to support high availability: fail-over and replication.
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
This topic is further discussed in the Database section:
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.
Downtime per year | 8h 45min 57s |
Downtime per month | 43m 49.7s |
Downtime per week | 10m 4.8s |
Downtime per day | 1m 26.4s |
Downtime per year | 52min 35.7s |
Downtime per month | 4m 23s |
Downtime per week | 1m 5s |
Downtime per day | 8.6s |
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
Overall availability decreases when two components with availability < 100% are in sequence:
Availability (Total) = Availability (Foo) * Availability (Bar)
If both Foo
and Bar
each had 99.9% availability, their total availability in sequence would be 99.8%.
Overall availability increases when two components with availability < 100% are in parallel:
Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
If both Foo
and Bar
each had 99.9% availability, their total availability in parallel would be 99.9999%.
Source: DNS security presentation
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.
DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).
CNAME
(example.com to www.example.com) or to an A
record.Services such as CloudFlare and Route 53 provide managed DNS services. Some DNS services can route traffic through various methods:
A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.
Serving content from CDNs can significantly improve performance in two ways:
Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.
Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.
Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.
Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.
Source: Scalable system design patterns
Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:
Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.
Additional benefits include:
To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.
Load balancers can route traffic based on various metrics, including:
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.
A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.
Additional benefits include:
Source: Intro to architecting systems for scale
Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.
Workers in the application layer also help enable asynchronism.
Related to this discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.
Source: Scaling up to your first 10 million users
A relational database like SQL is a collection of data items organized in tables.
ACID is a set of properties of relational database transactions.
There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
Source: Scalability, availability, stability, patterns
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
Source: Scalability, availability, stability, patterns
Source: Scaling up to your first 10 million users
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
Source: Scalability, availability, stability, patterns
Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.
Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database join can be very expensive, spending a significant amount of time on disk operations.
SQL tuning is a broad topic and many books have been written as reference.
It's important to benchmark and profile to simulate and uncover bottlenecks.
Benchmarking and profiling might point you to the following optimizations.
CHAR
instead of VARCHAR
for fixed-length fields.
CHAR
effectively allows for fast, random access, whereas with VARCHAR
, you must find the end of a string before moving onto the next one.TEXT
for large blocks of text such as blog posts. TEXT
also allows for boolean searches. Using a TEXT
field results in storing a pointer on disk that is used to locate the text block.INT
for larger numbers up to 2^32 or 4 billion.DECIMAL
for currency to avoid floating point representation errors.BLOBS
, store the location of where to get the object instead.VARCHAR(255)
is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.NOT NULL
constraint where applicable to improve search performance.SELECT
, GROUP BY
, ORDER BY
, JOIN
) could be faster with indices.NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.
BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.
In addition to choosing between SQL or NoSQL, it is helpful to understand which type of NoSQL database best fits your use case(s). We'll review key-value stores, document stores, wide column stores, and graph databases in the next section.
Abstraction: hash table
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
Abstraction: key-value store with documents stored as values
A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.
Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.
Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.
Document stores provide high flexibility and are often used for working with occasionally changing data.
Source: SQL & NoSQL, a brief history
Abstraction: nested map
ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
Abstraction: graph
In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.
Source: Transitioning from RDBMS to NoSQL
Reasons for SQL:
Reasons for NoSQL:
Sample data well-suited for NoSQL:
Source: Scalable system design patterns
Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.
Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.
CDNs are considered a type of cache.
Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.
Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.
Redis has the following additional features:
There are multiple levels you can cache that fall into two general categories: database queries and objects:
Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.
Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:
See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):
Suggestions of what to cache:
Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.
Source: From cache to in-memory data grid
The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:
def get_user(self, user_id):
user = cache.get("user.{0}", user_id)
if user is None:
user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
if user is not None:
key = "user.{0}".format(user_id)
cache.set(key, json.dumps(user))
return user
Memcached is generally used in this manner.
Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.
Source: Scalability, availability, stability, patterns
The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:
Application code:
set_user(12345, {"foo":"bar"})
Cache code:
def set_user(user_id, values):
user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
cache.set(user_id, user)
Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
Source: Scalability, availability, stability, patterns
In write-behind, the application does the following:
Source: From cache to in-memory data grid
You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.
Source: Intro to architecting systems for scale
Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.
Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:
The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.
Redis is useful as a simple message broker but messages can be lost.
RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.
Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.
Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.
Celery has support for scheduling and primarily has python support.
If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with exponential backoff.
HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.
A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:
Verb Description Idempotent* Safe CacheableGET | Reads a resource | Yes | Yes | Yes |
POST | Creates a resource or trigger a process that handles data | No | No | Yes if response contains freshness info |
PUT | Creates or replace a resource | Yes | No | No |
PATCH | Partially updates a resource | No | No | Yes if response contains freshness info |
DELETE | Deletes a resource | Yes | No | No |
*Can be called many times without different outcomes.
HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.
Source: How to make a multiplayer game
TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through:
If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.
To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.
TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.
Use TCP over UDP when:
Source: How to make a multiplayer game
UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.
UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.
UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.
Use UDP over TCP when:
Source: Crack the system design interview
In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include Protobuf, Thrift, and Avro.
RPC is a request-response protocol:
Sample RPC calls:
GET /someoperation?data=anId
POST /anotheroperation
{
"data":"anId";
"anotherdata": "another value"
}
RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.
Choose a native library (aka SDK) when:
HTTP APIs following REST tend to be used more often for public APIs.
REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.
There are four qualities of a RESTful interface:
Sample REST calls:
GET /someresources/anId
PUT /someresources/anId
{"anotherdata": "another value"}
REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.
Signup | POST /signup | POST /persons |
Resign | POST /resign { "personid": "1234" } |
DELETE /persons/1234 |
Read a person | GET /readPerson?personid=1234 | GET /persons/1234 |
Read a person’s items list | GET /readUsersItemsList?personid=1234 | GET /persons/1234/items |
Add an item to a person’s items | POST /addItemToUsersItemsList { "personid": "1234"; "itemid": "456" } |
POST /persons/1234/items { "itemid": "456" } |
Update an item | POST /modifyItem { "itemid": "456"; "key": "value" } |
PUT /items/456 { "key": "value" } |
Delete an item | POST /removeItem { "itemid": "456" } |
DELETE /items/456 |
Source: Do you really know why you prefer REST over RPC
This section could use some updates. Consider contributing!
Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics:
You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers of two table and Latency numbers every programmer should know are handy references.
Power Exact Value Approx Value Bytes
---------------------------------------------------------------
7 128
8 256
10 1024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296 4 GB
40 1,099,511,627,776 1 trillion 1 TB
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Handy metrics based on numbers above:
Question Reference(s)Common system design interview questions, with links to resources on how to solve each.
Design a file sync service like Dropbox | youtube.com |
Design a search engine like Google | queue.acm.org stackexchange.com ardendertat.com stanford.edu |
Design a scalable web crawler like Google | quora.com |
Design Google docs | code.google.com neil.fraser.name |
Design a key-value store like Redis | slideshare.net |
Design a cache system like Memcached | slideshare.net |
Design a recommendation system like Amazon's | hulu.com ijcai13.org |
Design a tinyurl system like Bitly | n00tc0d3r.blogspot.com |
Design a chat app like WhatsApp | highscalability.com |
Design a picture sharing system like Instagram | highscalability.com highscalability.com |
Design the Facebook news feed function | quora.com quora.com slideshare.net |
Design the Facebook timeline function | facebook.com highscalability.com |
Design the Facebook chat function | erlang-factory.com facebook.com |
Design a graph search function like Facebook's | facebook.com facebook.com facebook.com |
Design a content delivery network like CloudFlare | figshare.com |
Design a trending topic system like Twitter's | michael-noll.com snikolov .wordpress.com |
Design a random ID generation system | blog.twitter.com github.com |
Return the top k requests during a time interval | cs.ucsb.edu wpi.edu |
Design a system that serves data from multiple data centers | highscalability.com |
Design an online multiplayer card game | indieflashblog.com buildnewgames.com |
Design a garbage collection system | stuffwithstuff.com washington.edu |
Design an API rate limiter | https://stripe.com/blog/ |
Design a Stock Exchange (like NASDAQ or Binance) | Jane Street Golang Implementation Go Implementation |
Add a system design question | Contribute |
Articles on how real world systems are designed.
Source: Twitter timelines at scale
Don't focus on nitty gritty details for the following articles, instead:
Data processing | MapReduce - Distributed data processing from Google | research.google.com |
Data processing | Spark - Distributed data processing from Databricks | slideshare.net |
Data processing | Storm - Distributed data processing from Twitter | slideshare.net |
Data store | Bigtable - Distributed column-oriented database from Google | harvard.edu |
Data store | HBase - Open source implementation of Bigtable | slideshare.net |
Data store | Cassandra - Distributed column-oriented database from Facebook | slideshare.net |
Data store | DynamoDB - Document-oriented database from Amazon | harvard.edu |
Data store | MongoDB - Document-oriented database | slideshare.net |
Data store | Spanner - Globally-distributed database from Google | research.google.com |
Data store | Memcached - Distributed memory caching system | slideshare.net |
Data store | Redis - Distributed memory caching system with persistence and value types | slideshare.net |
File system | Google File System (GFS) - Distributed file system | research.google.com |
File system | Hadoop File System (HDFS) - Open source implementation of GFS | apache.org |
Misc | Chubby - Lock service for loosely-coupled distributed systems from Google | research.google.com |
Misc | Dapper - Distributed systems tracing infrastructure | research.google.com |
Misc | Kafka - Pub/sub message queue from LinkedIn | slideshare.net |
Misc | Zookeeper - Centralized infrastructure and services enabling synchronization | slideshare.net |
Add an architecture | Contribute |
Architectures for companies you are interviewing with.
Questions you encounter might be from the same domain.
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
Interested in adding a section or helping complete one in-progress? Contribute!
Credits and sources are provided throughout this repo.
Special thanks to:
Feel free to contact me to discuss any issues, questions, or comments.
My contact info can be found on my GitHub page.
I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).
Copyright 2017 Donne Martin
Creative Commons Attribution 4.0 International License (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
a state-of-the-art-level open visual language model | 多模态预训练模型
🔥 News: CogVLM bilingual version is available online! Welcome to try it out!
🔥 News: CogVLM中英双语版正式上线了!欢迎体验!
🔥 News: We are currently preparing to open-source a more powerful model with rich chart and document understanding capabilities. It has achieved a score of 81 on DocVQA, so stay tuned for its release!
CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.
CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images.
CogVLM can accurately describe images in details with very few hallucinations.
Click for comparison with LLAVA-1.5 and MiniGPT-4.CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details.
We support two GUIs for model inference, web demo and CLI. If you want to use it in your python code, it is easy to modify the CLI scripts for your case.
First, we need to install the dependencies.
pip install -r requirements.txt
python -m spacy download en_core_web_sm
We also offer a local web demo based on Gradio. First, install Gradio by running: pip install gradio
. Then download and enter this repository and run web_demo.py
. See the next section for detailed usage:
python web_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python web_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
The GUI of the web demo looks like:
We open-source different checkpoints for different downstreaming tasks:
cogvlm-chat
The model after SFT for alignment, which supports chat like GPT-4V.cogvlm-base-224
The original checkpoint after text-image pretraining.cogvlm-base-490
The finetuned version on 490px
resolution from cogvlm-base-224
. The finetuning data includes the training sets of VQA datasets.cogvlm-grounding-generalist
. This checkpoint supports different visual grounding tasks, e.g. REC, Grounding Captioning, etc.Run CLI demo via:
python cli_demo.py --from_pretrained cogvlm-base-224 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-base-490 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python cli_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
The program will automatically download the sat model and interact in the command line. You can generate replies by entering instructions and pressing enter. Enter clear
to clear the conversation history and stop
to stop the program.
We also support model parallel inference, which splits model to multiple (2/4/8) GPUs. --nproc-per-node=[n]
in the following command controls the number of used GPUs.
torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
Note:
--local_tokenizer /path/to/vicuna-7b-v1.5
to load the tokenizer.~/.sat_models
. Change the default location by setting the environment variable SAT_HOME
. For example, if you want to save the model to /path/to/my/models
, you can run export SAT_HOME=/path/to/my/models
before running the python command.The program provides the following hyperparameters to control the generation process:
usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english]
optional arguments:
-h, --help show this help message and exit
--max_length MAX_LENGTH
max length of the total sequence
--top_p TOP_P top p for nucleus sampling
--top_k TOP_K top k for top k sampling
--temperature TEMPERATURE
temperature for sampling
--english only output English
You may want to use CogVLM in your own task, which needs a different output style or domain knowledge. We here provide a finetuning example for Captcha Recognition.
Start by downloading the Captcha Images dataset. Once downloaded, extract the contents of the ZIP file.
To create a train/validation/test split in the ratio of 80/5/15, execute the following:
python scripts/split_dataset.py
Start the fine-tuning process with this command:
bash scripts/finetune_(224/490)_lora.sh
Merge the model to model_parallel_size=1
: (replace the 4 below with your training MP_SIZE
)
torchrun --standalone --nnodes=1 --nproc-per-node=4 merge_model.py --version base --bf16 --from_pretrained ./checkpoints/merged_lora_(224/490)
Evaluate the performance of your model.
bash scripts/evaluate_(224/490).sh
It is recommended to use the 490px
version. However, if you have limited GPU resources (such as only one node with 8* RTX 3090), you can try 224px
version with model parallel.
The anticipated result of this script is around 95%
accuracy on test set.
It is worth noting that the fine-tuning examples only tune limited parameters. (Expert only) If you want to get >98%
accuracy, you need to increase the trainable parameters in finetune_demo.py
.
The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License.
If you find our work helpful, please consider citing the following papers
@article{wang2023cogvlm,
title={CogVLM: Visual Expert for Pretrained Language Models},
author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
year={2023},
eprint={2311.03079},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.
Teaching LLMs memory management for unbounded context 📚🦙
Join Discord and message the MemGPT bot (in the #memgpt
channel). Then run the following commands (messaged to "MemGPT Bot"):
/profile
(to create your profile)/key
(to enter your OpenAI key)/create
(to create a MemGPT chatbot)Make sure your privacy settings on this server are open so that MemGPT Bot can DM you:
MemGPT → Privacy Settings → Direct Messages set to ON
You can see the full list of available commands when you enter /
into the message box.
Memory-GPT (or MemGPT in short) is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. Learn more about MemGPT in our paper.
Install MemGPT:
pip install pymemgpt
Add your OpenAI API key to your environment:
export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
set OPENAI_API_KEY=YOUR_API_KEY # on Windows
$Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)
Configure default setting for MemGPT by running:
memgpt configure
Now, you can run MemGPT with:
memgpt run
You can run the following commands in the MemGPT CLI prompt:
/exit
: Exit the CLI/attach
: Attach a loaded data source to the agent/save
: Save a checkpoint of the current agent/conversation state/dump
: View the current message log (see the contents of main context)/dump <count>
: View the last
messages (all if
is omitted)
/memory
: Print the current contents of agent memory/pop
: Undo the last message in the conversation/pop <count>
: Undo the last messages in the conversation. It defaults to 3, which usually is one turn around in the conversation/retry
: Pops the last answer and tries to get another one/rethink <text>
: Will replace the inner dialog of the last assistant message with the
to help shaping the conversation
/rewrite
: Will replace the last assistant answer with the given text to correct or force the answer/heartbeat
: Send a heartbeat system message to the agent/memorywarning
: Send a memory warning system message to the agentOnce you exit the CLI with /exit
, you can resume chatting with the same agent by specifying the agent name in memgpt run --agent <NAME>
.
See full documentation at: https://memgpt.readthedocs.io/
To install MemGPT from source, start by cloning the repo:
git clone git@github.com:cpacker/MemGPT.git
Then navigate to the main MemGPT
directory, and do:
pip install -e .
Now, you should be able to run memgpt
from the command-line using the downloaded source code.
If you are having dependency issues using pip install -e .
, we recommend you install the package using Poetry (see below). Installing MemGPT from source using Poetry will ensure that you are using exact package versions that have been tested for the production build.
First, install Poetry using the official instructions here.
Then, you can install MemGPT from source with:
git clone git@github.com:cpacker/MemGPT.git
poetry shell
poetry install
For issues and feature requests, please open a GitHub issue or message us on our #support
channel on Discord
Datasets used in our paper can be downloaded at Hugging Face.
When you enter the world of Python, you will hear that many developers love Django ORM, and others love SQLAlchemy. Each of those groups will tell you to your face why you have to choose their loved library, and if we add the async part of the programming, they will brag about the capacity of […]
The post A Battle of Async Titans: Django ORM Async vs. SQLAlchemy Async appeared first on Distillery.
The first "pyutrecht" meetup in Amersfoort in the Netherlands. (Amersfoort is not the city of Utrecht, but it is in the similarly named province of Utrecht).
I gave a talk myself about being more of a proper programmer to your own laptop setup. Have a git repo with a README explaining which programs you installed. An install script or makefile for installing certain tools. "Dotfiles" for storing your config in git. Etc. I haven't made a summary of my own talk. Here are the other three:
William works at deliverect, the host of the meeting. Webscraping means extracting data from a website and parsing it into a more useful format. Like translating a list of restaurants on a
There's a difference with web crawling: that is following links and trying to download all the pages on a website.
Important: robots.txt. As a crawler or scraper you're supposed to read it as it tells you which user agents are allowed and which areas of the website are off-limits (or not useful).
Another useful file that is often available: /sitemap.xml. A list of URLs in the site that the site thinks are useful for scraping or crawling.
A handy trick: looking at the network tab when browsing the website. Are there any internal APIs that the javascript frontend uses to populate the page? Sometimes they are blocked from easy scraping or they're difficult to access due to creative headers or authentication or cookies or session IDs.
A tip: beautifulsoup, a python library for extracting neat, structured content from an otherwise messy html page.
selenium is an alternative as it behaves much more like a regular webbrowser. So you can "click" a "next" button a couple of times in order to get a full list of items. Because selenium behaves like a real webbrowser, things like cookies and IDs in query parameters and headers just work. That makes it easier to work around many kinds of basic protection.
A microcontroller is a combination of cpu, memory and some interfaces to external ports. https://micropython.org is a version of python for such low-power devices.
He demoed python's prompt running on a raspberrypi micro connected via microUSB. And of course the mandatory lets-blink-the-onboard-LED programs. And then some other demoes with more leds and servos. Nice.
A big advantage of micropython is that it doesn't care what processor you have. With C/C++ you specifically have to compile for the right kind of processor. With micropython you can just run your code anywhere.
You can use micropython in three ways:
He showed a couple of possible target microcontrollers. A note to myself about the ESP8266: limited support, use .mpy. I think I have a few of those at home for should-test-it-at-some-time :-) Some examples: Pi RP2040, ESP32, Teensy 4.1.
A problem: RAM is scarce in such chips and python is hungry... You can do some tricks like on-demand loading. Watch out when using an LCD graphic display, that takes 150kb easily.
You have to watch out for the timing requirements of what you want to do. Steering a servo is fine, but "neopixel" leds for instance needs a higher frequency of signals than micropython is capable of on such a microcontroller. If you use a C library for it, it works (he showed a demo).
Erik works as maintainer on the Graphene and the strawberry-GraphQL projects.
Graphql is a query language for APIs. It is an alternative to the well-known REST method. With REST you often have to do multiple requests to get all the data you have. And the answers will often give more information than you actually need.
With graphql, you always start with a graphql schema. You can compare it a bit to an openapi document. The graphql schema specifies what you can request ("a Meetup has a name, description, list of talks, etc").
An actual query specifies what you want to get back as response. You can omit fields from the schema that you don't need. If you don't need "description", you leave it out. If you want to dive deeper into certain objects, you specify their fields.
Strawberry is a graphql framework. It has integrations for django, sqlalchemy, pydantic and more. The schemas are defined with classes annotated with @strawberry.type and fields with python type hints. (It looked neat!)
He showed a live demo, including the browser-based query interface bundled with graphql.
Note: strawberry is the more modern project (type hints and so) and will later have all the functionality of graphene. So if strawberry's functionality is enough, you should use that one.
My Software development career was highly influenced by developer communities. Participating in tech meet-ups and events, notably DjangoCon Africa, has not only expanded my technical skills but also shaped my approach to both personal and professional growth. This experience has motivated me to seek a position on the Django Software Foundation Board, especially after the talks from Anna Makarudze on Navigating the Open-Source World as a Minority, that highlighted the challenges of organising events that benefits African communities As an advocate for African and minority communities within the tech ecosystem, I aspire to bring a unique and necessary perspective to the DSF Board. My commitment to volunteering and giving back to the community aligns perfectly with the ethos of the Django community. My experiences have taught me the value of dedicated community organizers who selflessly share resources and knowledge, fostering an environment where developers at all levels can thrive.
Joining the DSF Board would enable me to champion the interests of young and emerging developers globally, particularly from underrepresented regions. I aim to ensure that everyone, regardless of their background, has equitable access to the opportunities that Django, both as a community and a web development framework, can offer.
In my role with a Non-Governmental Organization aiding youth groups along the Kenyan Coast(Swahilipot Hub Foundation), I've garnered experience in community engagement and utilizing technology for social good. This experience has been instrumental in creating Django-based platforms that empower community self-management. My presence on the DSF Board would not only represent these communities but also allow me to serve as a mentor and technical advisor.
I am eager to contribute my insights and leadership to the DSF Board. With your support, I hope to make a meaningful impact, fostering an inclusive and dynamic environment where every developer can achieve their full potential.
Software developer for over 20 years, fell in love with django almost at the beginning of his journey 2007, version 0.96. He loves Django and Python so much he has been bringing developers to the community since then, ended up starting his consultancy firm around these technologies.
During DjangoCon Europe 2019 at Copenhagen he decided to take the next step helping the community, proposing to organize DjangoCon Europe 2020 in Portugal. He got more than he bargained for, ending up co-organising the first virtual-only DjangoCon Europe, repeating in 2021, and finally a hybrid DjangoCon Europe in 2022. His effort, together with the team around him, was rewarded with success, the 2022 edition had record breaking attendees with 500+ in person and 200+ online. To keep things going he is also co-organising DjangoCon Europe in 2024 in Spain Vigo, hoping to bring the Spanish community closer.
David is also contributing to the Portuguese Python Community, starting in 2022 the very first PyCon Portugal. His drive is to bring The Portuguese community forward, with a different city every year to increase the reach of the conference. The first edition was in Porto, leveraging on DjangoCon Europe 2022, this year it was in Coimbra, with participants from over 25 countries, and we are already preparing the next edition.
David is enthusiastic, committed and pragmatic. Throughout his personal and professional journey, he has always had a positive impact in every process he puts his mind on, influencing, building and empowering the people around him. He hopes to put his experience to good use in Django Software Foundation.
I was one of the original maintainers of Django, and was the original founder and first President of the DSF. I re-joined the DSF board and have served for the last year. Outside of Django, I'm a security consultant at Latacora, and have previously ran engineering and security teams at 18F and Heroku.
When I ran for the board last year, I wrote:
> I'd be coming back to the DSF with a bunch of experience in executive leadership and more experience working with nonprofits. I think I can apply those skills, along with my general knowledge of the Django community, to push things forward. What that means, specifically, isn't entirely clear yet. I'd plan to spend the first months of my board term asking a bunch of questions and listening.
I did that asking-questions-and-listening, and what needs doing at the DSF became clear. I'd most succinctly articulate it as: "new blood".
The Django community is super-vibrant and new people are joining the community all the time, but it's very hard for people to "level up" and move to any sort of leadership position at the DSF or among the core team. We just don't have very many opportunities for people to have an impact, and we don't have good "onramps" to that work.
So, this term, I (with the rest of the board) started building some of these opportunities onramps! The recently-announced working group and membership changes are the start of this, and if re-elected I'd want to continue working in this direction. It's now easier for people to join the DSF, and easier for them to spin up working groups to do impactful work. But now we need to start defining these groups, funding them, and continuing this growth.
The Django community often serves as a great example for many aspects of the broader Python community. Our community shines when many of us get involved. To make this happen, we need to encourage greater community involvement.
My goals for the next two years, if elected, are to increase the amount of information we share with the community while reducing the time it takes to disseminate that information to the community.
I intend to utilize the existing channels in the Django and the larger Python community. We will also establish new official communication channels for the foundation. These channels will be managed by a Communications Working Group.
The second effort is to extend our reach to a global and diverse audience. We understand that our impact can extend far beyond our current scope by expanding working groups. Therefore, I would work to create and support working groups that currently lack direct representation in the DSF. I would also advocate for decisions that directly impact these areas to be developed and executed by those individual groups with DSF support.
I hope that you will support me in this vision, which aims to increase the visibility and support of the DSF to the farthest reaches of the community.
I really like helping people and also helping this awesome community to grow. I don't have much to say 🙂.. But I really like volunteering work it helps me to make something that I could be proud of and also make some new friends!
I'm Ngazetungue Muheue, a dedicated software developer, community advocate, and a member of the Django Software Foundation (DSF). I'm also the founder of the Python and Django Community in Namibia. Despite facing unique challenges as a member of underprivileged communities and living with a disability, I've played a significant role in expanding Django by establishing and contributing to various Django and Python communities in Africa and Namibia.
Recognizing the importance of open-source communities and user-friendly technology, I've worked closely with students and underprivileged individuals to bridge the tech gap by involving them in Django user groups, teaching Django, and fostering their participation in the global tech community. As a visionary leader, I've cultivated a culture of collaboration, inclusivity, and continuous learning within the African tech ecosystem. My contributions include organizing the inaugural DjangoCon Africa in 2023 and actively participating in organizing and volunteering at DjangoCon Europe in 2023 and 2022, advancing the growth of the Django ecosystem. I've also spoken at various PyCon events worldwide, showcasing my commitment to fostering the global Django and Python community.
As a board member of the Django Software Foundation, my primary goal is to expand Django communities worldwide, connect underprivileged community members with the DSF, and enhance the inclusivity of the Django community. This involves translating Django documentation for non-English speakers, increasing project grants, integrating people with disabilities into the community, and creating internship opportunities for a more diverse and empowered Django community.
Joining the DSF board will enable me to inspire and support nations in engaging young and underprivileged individuals in tech-related activities while safeguarding the interests and mission of our community and the DSF. More links: https://twitter.com/muheuenga https://2023.djangocon.africa/team https://twitter.com/djangonamibia https://na.pycon.org/ https://pynam.org/django/
Ciao, I'm Paolo and I live in Italy.
I've been a contributor to the Django project for years, and a member of the DSF. I attended my first DjangoCon Europe in 2017 and have since presented many Django talks at conferences around the world. I've participated as a coach in DjangoGirls workshops several times, and I organized one in my hometown. I've always been a Python developer, I helped the PyCon Italia organization for a few years and I recently founded the Python Pescara meetup.
As a member of the DSF board of directors, I would like to bring a different point of view to the foundation, as a southern European citizen, inhabitant of the Mediterranean area, non-native English speaker, and a small company employee.
Some initiatives I would like to carry forward are:organize active user sprints to focus on specific Django features continue the work of renovating the Django project website create synergies with the Python community and its web sub-communities simplify Django documentation and help its translations support creators of Django content (e.g. books, articles, podcasts, videos, ...)
I'm a current DSF board member and acting Treasurer.
I've been a part of the Django community for over 15 years. I'm an open-source contributor, a regular speaker at DjangoCon US, and the co-author of High Performance Django. In 2007, I founded Lincoln Loop, a web agency that leverages Django extensively in its work. Lincoln Loop has financially sponsored the DSF and DjangoCon for many years, and I'm looking for other ways to give back to a community that has given us so much.
At Lincoln Loop, I have to wear many hats and deeply understand the financial ramifications of our decisions as a company. I believe the experience of running a business will be directly applicable to a position on the DSF board, and I look forward to applying that experience if elected.
I'm an active DSF member and I've been contributing to this amazing community via multiple ways:
Django contributor and Accessibility Team Member Maintainer of djangoproject.com Organizer of Djangonaut Space Organizer of Django Paris Meetup Organizer of DjangoCon Europe 2023
I have seen many aspects of the community through all those experiences. As a relatively new member, I can bring a fresh perspective to the community and help foster a renewed sense of togetherness. I have a strong connection with Djangonaut Space mentoring program and the community. I'm well positioned to serve as an intermediary, facilitating communication regarding initiatives and ideas between the board and the community.
I would like to increase fundraising by improving communication and making improvements to make each sponsor special by highlighting sponsors not only on the website but also on social networks. Relying on my experiences with various Django projects, I will push forward ideas to further develop our community, specifically helping existing and new contributors.
With the community's support, I will set up a working group for mentorship and push accessibility in the framework. I am passionate about these topics as they show that Django is a framework for everyone by everyone.
I see myself as a representative of Django's diversity and would like to emphasize and expand the richness of it even more. Being part of the board would inspire people to get involved and be part of the community. They could add their stone to the building of this wonderful community.
To me, Django feels like it's in maintenance mode, a decade behind in areas like front-end development and serverless. To stay relevant compared to projects with tens of millions in venture capital, we need a more vibrant, more diverse community. We can build one together by making the right programs happen, like Djangonaut Space and Outreachy.
The DSF also needs to evolve with the times. In the age of ChatGPT, copyright and trademarks are very dated concerns. We need a foundation that can help its community navigate modern societal challenges: social equity issues affecting our users; accessibility issues plaguing the Django web; climate change and Django's carbon footprint.
I can help. Let's grow Django's contributors 10x, and have the Django universe lead by example in community-driven open source.
I've been using Django since 2008. A lot has changed since then, but one constant has been my wish to see Django continuously improve.
I'm active in the community in many ways. I've been a regular code contributor since 2016. I founded the accessibility team, and also started the official Discord server. So I've dedicated quite some time to Django already, but I have room for more, with even more impact.
I would like to help grow the next generation of Django contributors, from more diverse backgrounds. From running DjangoCon sprint tables over the years, and getting involved with Djangonaut Space, it's clear to me that the new contributor experience has substantial room for improvement.
I also want to expand Django's fundraising efforts. It's becoming difficult to add important new features. We need more funding to hire more Fellows, and expand their remit to work on bigger features.
The new working groups are a much needed initiative, and I'd love to help develop all these ideas to their fullest potential.
As a passionate software developer and technical writer deeply rooted in the open-source community, I am honored to be running for the DSF board. My experience in contributing to open-source projects, coupled with my leadership background in the Open Source Community Africa Nairobi, has ignited my desire to enhance the participation and contributions of communities from diverse backgrounds. My involvement in open-source initiatives has made me appreciate the power of collaboration and the impact of collective efforts. I have witnessed firsthand how open-source communities foster innovation and inclusivity, enabling individuals from all over the world to share their knowledge and expertise.
Driven by my belief of open source impact, I aspire to elevate the DSF board's decision-making process by incorporating the unique perspectives and insights of communities from diverse backgrounds. My experience working with developer communities has equipped me with the skills and empathy necessary to understand and address the specific needs of these underrepresented groups. As a leader, I prioritize decision-making that aligns with the needs and aspirations of the community. I believe in fostering an environment where everyone feels empowered to participate, contribute, and lead. My commitment to inclusivity extends beyond the color of one's skin; I envision a DSF community that embraces and celebrates the diversity of thought, experience, and background.
My passion for Django and my role as an advocate for the framework extend beyond personal preference. I recognize the immense value of Django to the developer community and am eager to contribute further through the DSF board. I believe that my involvement will allow me to add value to the Django community, supporting its growth and ensuring that it remains a thriving hub for developers worldwide. My journey in the open-source community began with a fascination for the framework. However, over time, I have come to realize that the true beauty of open-source lies in the community that surrounds it. I am committed to giving back to this community, not just as a developer or technical writer, but also as a leader and advocate for diversity and inclusion.
I humbly ask for your vote to join the DSF board and contribute my skills, experience, and passion to the continued growth and success of the Django community. Together, we can create a more inclusive and vibrant open-source ecosystem that empowers individuals from all backgrounds to innovate, collaborate, and make a lasting impact on the world.
Here are some Django-related deals for this year’s Black Friday (24th Nov) and Cyber Monday (27th Nov), including my own.
I’ll keep updating this post as I learn about more deals. If you are also a creator, email me with details of your offer and I’ll add it here.
My three books have a 50% discount, for both individual and team licenses, until the end of Cyber Monday (27th Nov). This deal stacks with the purchasing power parity discount for those in lower-income countries.
Buy now:
Aidas Bendoraitis of djangotricks.com created this paid third-party app for Django. The package takes the pain out of setting up and customizing legally mandated GDPR Cookie Consent screens. Compared to commercial “one size fits all” solutions, it’s much simpler to use this third-party app to host and tweak your project’s cookie consent screen.
Use the discount code BLACKFRIDAY2023
for 20% off, from €150 to €120, until the end of November.
Michael Yin has written a book on using Hotwire with Django. This is the “HTML-over-the-wire” suite of tools used heavily in Ruby on Rails.
Use this link for 20% off, from $39.99 to $27.80.
Corey Zue’s SaaS Pegasus, is a configurable Django project template with many preset niceties, including teams, Stripe subscriptions, a JavaScript pipeline, and multiple CSS themes. It can massively accelerate setting up a SaaS in Django.
The “unlimited lifetime license” is discounted 50%, from $999 to $499.50. This deal is available through 29 November.
Will Vincent is the author of three fantastic Django books:
He’s offering a 50% discount on the three-book bundle, from $97 to $48.50.
Okay, there’s no discount here, but there is a good deal! You can fund the framework that you love to ensure it continues to grow.
If you can spend some money on Django-related products this Black Friday, please consider sponsoring Django itself. You can support it by donating to the charity that runs the framework, the Django Software Foundation.
Your money will go towards:
You can sponsor Django on:
If you’re working with Django professionally, please consider sponsoring a few dollars a month. But better yet, get your organization to sponsor to pay back for all the advantages that Django offers you.
At the time of writing, Django is 58% towards its 2022 funding goal:
Let’s fill up that heart!
The annual Python developers survey is out. Please take a moment to share your Python practices as the results do have a big impact on the organizations and maintainers in our community.
Andrew Godwin, the developer of Takahē, is looking for new maintainers who want to help out in exchange for mentorship.
Last week we had 14 pull requests merged into Django by 9 different contributors - including 2 first time contributors! Congratulations to chenow and Patrick Rauscher for having their first commit merged into Django - welcome on board!
Some interesting things from last week...
RowRange
and ValueRange
now accept an exclusion
argument.Do you speak Português or हिंदी? We will soon have a translation string freeze for the 5.0 release, so this is a good time to join a translation team! You can see the languages Django supports on transifex as well as the ones missing translations. Perhaps you can help translate Django and make it more accessible to our global community!
Django Newsletter
Build your knowledge and skills in essential Django and Wagtail CMS practices. A two-part training programme, led by Senior Wagtail Developers that extends way beyond typical tutorials and documentation. Only 10 places available per course. Apply here: https://bit.ly/wagtail-developer-training-course
An introduction to database-generated columns using SQLite and the new GeneratedField added in Django 5.0.
How to attach volumes, run migrations, and get SQLite working for Django applications on Fly.io.
A guided deep dive into Django's source code to understand why your application is failing CSRF validation.
Sometimes, you lose track of your terminal window but still need to stop a Django development server. Here's how!
A pattern to speed up the GitHub Actions workflow of projects using Python, Pip, and pip-tools.
The PyLadiesCon 2023 Online conference schedule is live and spans ~24 hours, and three days, of back-to-back talks.
Django Boston is back on November 14th at 6pm EST with two talks on Celery and API client generation.
Three talks on November 16th on topics including no downtime migrations in Django.
Whether you attended DjangoCon US in person, virtually, or wished you could, this is a fantastic deep writeup of the event from one of its organizers.
30 minutes of Lightning talks from Django Day in Copenhagen recently.
A look at concurrency in the database through programming language concepts and try to understand what happens behind the scenes of pessimistic and optimistic locking.
How and why to have a style guide for tests in your Django application.
Sarah is a British developer based in Germany who is a member of Django’s Review and Triage Team. She is also a co-organizer of Djangonaut Space, a new mentorship program to onboard and develop Django contributors.
Sarah also contributes to our Django News "Updates to the Django" section.
A discussion of Git Mastery, Python tooling, the future of Django + Htmx / front-end development, and more.
We have two new jobs this week on Django News Jobs and several positions that are still open.
Senior Python Engineer at Loginsoft 🆕
Software Engineer - Ubuntu Systems Management at Canonical 🆕
Front End Web UI Django Developer (NC) at Ansys
Junior Web Developer at The Westervelt Company
Django Girls Communications Officer at Django Girls
Django Girls Awesomeness Ambassador at Django Girls
Django Newsletter
Use any icon (100,000+) from Iconify, for TailwindCSS.
Send SMS from Django application using any SMS service provider just writing a single line of code.
This RSS feed is published on https://django-news.com/. You can also subscribe via email.
(One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).
"Everybody" uses stackoverflow. Now lots of people use chatgpt (or chatgpt plus). Stackoverflow traffic has dropped by 50% in the last 1.5 year. So chatgpt can be your coding buddy.
He really likes it for quickly getting something working (MVP). Like writing something that talks to a magento API (a webshop system). It would take him ages to figure it all out. Or he could ask chatgpt.
He also thinks you don't need docstrings anymore: you can just ask chatgpt to explain a snippet of code for you. (Something I myself don't agree with, btw).
(He demoed some chatgpt code generation of a sample website). What he learned:
Some dangers:
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
To run the model you need Python 3.7+
If you don't have PyTorch installed. Follow their instructions here.
Install the package pix2tex
:
pip install "pix2tex[gui]"
Model checkpoints will be downloaded automatically.
There are three ways to get a prediction from an image.
You can use the command line tool by calling pix2tex
. Here you can parse already existing images from the disk and images in your clipboard.
Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr
. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.
Under linux, it is possible to use the GUI with gnome-screenshot
(which comes with multiple monitor support) if gnome-screenshot
was installed beforehand. For Wayland, grim
and slurp
will be used when they are both available. Note that gnome-screenshot
is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot
will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL
to grim
in this case (other available values are gnome-screenshot
and pil
).
If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature
parameter you can control this behavior (low temperature will produce the same result).
You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]"
and run
python -m pix2tex.api.run
to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex
docker pull lukasblecher/pix2tex:api
docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
To also run the streamlit demo run
docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
and navigate to http://localhost:8501/
Use from within Python
from PIL import Image
from pix2tex.cli import LatexOCR
img = Image.open('path/to/image.png')
model = LatexOCR()
print(model(img))
The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.
Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.
Want to use the package?
I'm trying to compile a documentation right now.
Visit here: https://pix2tex.readthedocs.io/
Install a couple of dependencies pip install "pix2tex[train]"
.
python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl
To use your own tokenizer pass it via --tokenizer
(See below).
You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.
data
(and valdata
) entry in the config file to the newly generated .pkl
file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml
for a template.python -m pix2tex.train --config path_to_config_file
If you want to use your own data you might be interested in creating your own tokenizer with
python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json
Don't forget to update the path to the tokenizer in the config file and set num_tokens
to your vocabulary size.
The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.
0.88 | 0.10 | 0.60 |
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here
In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
setup.py
)Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math
Contributions of any kind are welcome.
Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool
[1] An Image is Worth 16x16 Words
[3] Image-to-Markup Generation with Coarse-to-Fine Attention
Robust Speech Recognition via Large-Scale Weak Supervision
[Blog] [Paper] [Model card] [Colab example]
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:
pip install git+https://github.com/openai/whisper.git
To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust
installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install
command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH
environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH"
. If the installation fails with No module named 'setuptools_rust'
, you need to install setuptools_rust
, e.g. by running:
pip install setuptools-rust
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
Size Parameters English-only model Multilingual model Required VRAM Relative speedtiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
The .en
models for English-only applications tend to perform better, especially for the tiny.en
and base.en
models. We observed that the difference becomes less significant for the small.en
and medium.en
models.
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3
and large-v2
models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.
The following command will transcribe speech in audio files, using the medium
model:
whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small
model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language
option:
whisper japanese.wav --language Japanese
Adding --task translate
will translate the speech into English:
whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:
whisper --help
See tokenizer.py for the list of all available languages.
Transcription can also be performed within Python:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe()
method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of whisper.detect_language()
and whisper.decode()
which provide lower-level access to the model.
import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.
Whisper's code and model weights are released under the MIT License. See LICENSE for further details.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язык ∙ Español ∙ ภาษาไทย ∙ Türkçe ∙ tiếng Việt ∙ Français | Add Translation
Help translate this guide!
Learn how to design large-scale systems.
Prep for the system design interview.
Learning how to design scalable systems will help you become a better engineer.
System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.
This repo is an organized collection of resources to help you learn how to build systems at scale.
This is a continually updated, open source project.
Contributions are welcome!
In addition to coding interviews, system design is a required component of the technical interview process at many tech companies.
Practice common system design interview questions and compare your results with sample solutions: discussions, code, and diagrams.
Additional topics for interview prep:
The provided Anki flashcard decks use spaced repetition to help you retain key system design concepts.
Great for use while on-the-go.
Looking for resources to help you prep for the Coding Interview?
Check out the sister repo Interactive Coding Challenges, which contains an additional Anki deck:
Learn from the community.
Feel free to submit pull requests to help:
Content that needs some polishing is placed under development.
Review the Contributing Guidelines.
Summaries of various system design topics, including pros and cons. Everything is a trade-off.
Each section contains links to more in-depth resources.
Suggested topics to review based on your interview timeline (short, medium, long).
Q: For interviews, do I need to know everything here?
A: No, you don't need to know everything here to prepare for the interview.
What you are asked in an interview depends on variables such as:
More experienced candidates are generally expected to know more about system design. Architects or team leads might be expected to know more than individual contributors. Top tech companies are likely to have one or more design interview rounds.
Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the following guide based on your timeline, experience, what positions you are interviewing for, and which companies you are interviewing with.
Read through the System design topics to get a broad understanding of how systems work | 👍 | 👍 | 👍 |
Read through a few articles in the Company engineering blogs for the companies you are interviewing with | 👍 | 👍 | 👍 |
Read through a few Real world architectures | 👍 | 👍 | 👍 |
Review How to approach a system design interview question | 👍 | 👍 | 👍 |
Work through System design interview questions with solutions | Some | Many | Most |
Work through Object-oriented design interview questions with solutions | Some | Many | Most |
Review Additional system design interview questions | Some | Many | Most |
How to tackle a system design interview question.
The system design interview is an open-ended conversation. You are expected to lead it.
You can use the following steps to guide the discussion. To help solidify this process, work through the System design interview questions with solutions section using the following steps.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.
Outline a high level design with all important components.
Dive into details for each core component. For example, if you were asked to design a url shortening service, discuss:
Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability issues?
Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using principles of scalable system design.
You might be asked to do some estimates by hand. Refer to the Appendix for the following resources:
Check out the following links to get a better idea of what to expect:
QuestionCommon system design interview questions with sample discussions, code, and diagrams.
Solutions linked to content in the
solutions/
folder.
Design Pastebin.com (or Bit.ly) | Solution |
Design the Twitter timeline and search (or Facebook feed and search) | Solution |
Design a web crawler | Solution |
Design Mint.com | Solution |
Design the data structures for a social network | Solution |
Design a key-value store for a search engine | Solution |
Design Amazon's sales ranking by category feature | Solution |
Design a system that scales to millions of users on AWS | Solution |
Add a system design question | Contribute |
Common object-oriented design interview questions with sample discussions, code, and diagrams.
Solutions linked to content in the
solutions/
folder.
QuestionNote: This section is under development
Design a hash map | Solution |
Design a least recently used cache | Solution |
Design a call center | Solution |
Design a deck of cards | Solution |
Design a parking lot | Solution |
Design a chat server | Solution |
Design a circular array | Contribute |
Add an object-oriented design question | Contribute |
New to system design?
First, you'll need a basic understanding of common principles, learning about what they are, how they are used, and their pros and cons.
Scalability Lecture at Harvard
Next, we'll look at high-level trade-offs:
Keep in mind that everything is a trade-off.
Then we'll dive into more specific topics such as DNS, CDNs, and load balancers.
A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.1
Another way to look at performance vs scalability:
Latency is the time to perform some action or to produce some result.
Throughput is the number of such actions or results per unit of time.
Generally, you should aim for maximal throughput with acceptable latency.
In a distributed computer system, you can only support two of the following guarantees:
Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most recent write or an error.
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
After a write, reads will see it. Data is replicated synchronously.
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
There are two complementary patterns to support high availability: fail-over and replication.
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
This topic is further discussed in the Database section:
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.
Downtime per year | 8h 45min 57s |
Downtime per month | 43m 49.7s |
Downtime per week | 10m 4.8s |
Downtime per day | 1m 26.4s |
Downtime per year | 52min 35.7s |
Downtime per month | 4m 23s |
Downtime per week | 1m 5s |
Downtime per day | 8.6s |
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
Overall availability decreases when two components with availability < 100% are in sequence:
Availability (Total) = Availability (Foo) * Availability (Bar)
If both Foo
and Bar
each had 99.9% availability, their total availability in sequence would be 99.8%.
Overall availability increases when two components with availability < 100% are in parallel:
Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
If both Foo
and Bar
each had 99.9% availability, their total availability in parallel would be 99.9999%.
Source: DNS security presentation
A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.
DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).
CNAME
(example.com to www.example.com) or to an A
record.Services such as CloudFlare and Route 53 provide managed DNS services. Some DNS services can route traffic through various methods:
A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.
Serving content from CDNs can significantly improve performance in two ways:
Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.
Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.
Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.
Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.
Source: Scalable system design patterns
Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:
Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.
Additional benefits include:
To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.
Load balancers can route traffic based on various metrics, including:
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.
A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.
Additional benefits include:
Source: Intro to architecting systems for scale
Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.
Workers in the application layer also help enable asynchronism.
Related to this discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.
Source: Scaling up to your first 10 million users
A relational database like SQL is a collection of data items organized in tables.
ACID is a set of properties of relational database transactions.
There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
Source: Scalability, availability, stability, patterns
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
Source: Scalability, availability, stability, patterns
Source: Scaling up to your first 10 million users
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
Source: Scalability, availability, stability, patterns
Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.
Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database join can be very expensive, spending a significant amount of time on disk operations.
SQL tuning is a broad topic and many books have been written as reference.
It's important to benchmark and profile to simulate and uncover bottlenecks.
Benchmarking and profiling might point you to the following optimizations.
CHAR
instead of VARCHAR
for fixed-length fields.
CHAR
effectively allows for fast, random access, whereas with VARCHAR
, you must find the end of a string before moving onto the next one.TEXT
for large blocks of text such as blog posts. TEXT
also allows for boolean searches. Using a TEXT
field results in storing a pointer on disk that is used to locate the text block.INT
for larger numbers up to 2^32 or 4 billion.DECIMAL
for currency to avoid floating point representation errors.BLOBS
, store the location of where to get the object instead.VARCHAR(255)
is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.NOT NULL
constraint where applicable to improve search performance.SELECT
, GROUP BY
, ORDER BY
, JOIN
) could be faster with indices.NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.
BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.
In addition to choosing between SQL or NoSQL, it is helpful to understand which type of NoSQL database best fits your use case(s). We'll review key-value stores, document stores, wide column stores, and graph databases in the next section.
Abstraction: hash table
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
Abstraction: key-value store with documents stored as values
A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.
Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.
Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.
Document stores provide high flexibility and are often used for working with occasionally changing data.
Source: SQL & NoSQL, a brief history
Abstraction: nested map
ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
Abstraction: graph
In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.
Source: Transitioning from RDBMS to NoSQL
Reasons for SQL:
Reasons for NoSQL:
Sample data well-suited for NoSQL:
Source: Scalable system design patterns
Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.
Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.
CDNs are considered a type of cache.
Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.
Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.
Redis has the following additional features:
There are multiple levels you can cache that fall into two general categories: database queries and objects:
Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.
Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:
See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):
Suggestions of what to cache:
Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.
Source: From cache to in-memory data grid
The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:
def get_user(self, user_id):
user = cache.get("user.{0}", user_id)
if user is None:
user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
if user is not None:
key = "user.{0}".format(user_id)
cache.set(key, json.dumps(user))
return user
Memcached is generally used in this manner.
Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.
Source: Scalability, availability, stability, patterns
The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:
Application code:
set_user(12345, {"foo":"bar"})
Cache code:
def set_user(user_id, values):
user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
cache.set(user_id, user)
Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
Source: Scalability, availability, stability, patterns
In write-behind, the application does the following:
Source: From cache to in-memory data grid
You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.
Source: Intro to architecting systems for scale
Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.
Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:
The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.
Redis is useful as a simple message broker but messages can be lost.
RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.
Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.
Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.
Celery has support for scheduling and primarily has python support.
If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with exponential backoff.
HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.
A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:
Verb Description Idempotent* Safe CacheableGET | Reads a resource | Yes | Yes | Yes |
POST | Creates a resource or trigger a process that handles data | No | No | Yes if response contains freshness info |
PUT | Creates or replace a resource | Yes | No | No |
PATCH | Partially updates a resource | No | No | Yes if response contains freshness info |
DELETE | Deletes a resource | Yes | No | No |
*Can be called many times without different outcomes.
HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.
Source: How to make a multiplayer game
TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through:
If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.
To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.
TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.
Use TCP over UDP when:
Source: How to make a multiplayer game
UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.
UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.
UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.
Use UDP over TCP when:
Source: Crack the system design interview
In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include Protobuf, Thrift, and Avro.
RPC is a request-response protocol:
Sample RPC calls:
GET /someoperation?data=anId
POST /anotheroperation
{
"data":"anId";
"anotherdata": "another value"
}
RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.
Choose a native library (aka SDK) when:
HTTP APIs following REST tend to be used more often for public APIs.
REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.
There are four qualities of a RESTful interface:
Sample REST calls:
GET /someresources/anId
PUT /someresources/anId
{"anotherdata": "another value"}
REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.
Signup | POST /signup | POST /persons |
Resign | POST /resign { "personid": "1234" } |
DELETE /persons/1234 |
Read a person | GET /readPerson?personid=1234 | GET /persons/1234 |
Read a person’s items list | GET /readUsersItemsList?personid=1234 | GET /persons/1234/items |
Add an item to a person’s items | POST /addItemToUsersItemsList { "personid": "1234"; "itemid": "456" } |
POST /persons/1234/items { "itemid": "456" } |
Update an item | POST /modifyItem { "itemid": "456"; "key": "value" } |
PUT /items/456 { "key": "value" } |
Delete an item | POST /removeItem { "itemid": "456" } |
DELETE /items/456 |
Source: Do you really know why you prefer REST over RPC
This section could use some updates. Consider contributing!
Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics:
You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers of two table and Latency numbers every programmer should know are handy references.
Power Exact Value Approx Value Bytes
---------------------------------------------------------------
7 128
8 256
10 1024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296 4 GB
40 1,099,511,627,776 1 trillion 1 TB
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Handy metrics based on numbers above:
Question Reference(s)Common system design interview questions, with links to resources on how to solve each.
Design a file sync service like Dropbox | youtube.com |
Design a search engine like Google | queue.acm.org stackexchange.com ardendertat.com stanford.edu |
Design a scalable web crawler like Google | quora.com |
Design Google docs | code.google.com neil.fraser.name |
Design a key-value store like Redis | slideshare.net |
Design a cache system like Memcached | slideshare.net |
Design a recommendation system like Amazon's | hulu.com ijcai13.org |
Design a tinyurl system like Bitly | n00tc0d3r.blogspot.com |
Design a chat app like WhatsApp | highscalability.com |
Design a picture sharing system like Instagram | highscalability.com highscalability.com |
Design the Facebook news feed function | quora.com quora.com slideshare.net |
Design the Facebook timeline function | facebook.com highscalability.com |
Design the Facebook chat function | erlang-factory.com facebook.com |
Design a graph search function like Facebook's | facebook.com facebook.com facebook.com |
Design a content delivery network like CloudFlare | figshare.com |
Design a trending topic system like Twitter's | michael-noll.com snikolov .wordpress.com |
Design a random ID generation system | blog.twitter.com github.com |
Return the top k requests during a time interval | cs.ucsb.edu wpi.edu |
Design a system that serves data from multiple data centers | highscalability.com |
Design an online multiplayer card game | indieflashblog.com buildnewgames.com |
Design a garbage collection system | stuffwithstuff.com washington.edu |
Design an API rate limiter | https://stripe.com/blog/ |
Design a Stock Exchange (like NASDAQ or Binance) | Jane Street Golang Implementation Go Implementation |
Add a system design question | Contribute |
Articles on how real world systems are designed.
Source: Twitter timelines at scale
Don't focus on nitty gritty details for the following articles, instead:
Data processing | MapReduce - Distributed data processing from Google | research.google.com |
Data processing | Spark - Distributed data processing from Databricks | slideshare.net |
Data processing | Storm - Distributed data processing from Twitter | slideshare.net |
Data store | Bigtable - Distributed column-oriented database from Google | harvard.edu |
Data store | HBase - Open source implementation of Bigtable | slideshare.net |
Data store | Cassandra - Distributed column-oriented database from Facebook | slideshare.net |
Data store | DynamoDB - Document-oriented database from Amazon | harvard.edu |
Data store | MongoDB - Document-oriented database | slideshare.net |
Data store | Spanner - Globally-distributed database from Google | research.google.com |
Data store | Memcached - Distributed memory caching system | slideshare.net |
Data store | Redis - Distributed memory caching system with persistence and value types | slideshare.net |
File system | Google File System (GFS) - Distributed file system | research.google.com |
File system | Hadoop File System (HDFS) - Open source implementation of GFS | apache.org |
Misc | Chubby - Lock service for loosely-coupled distributed systems from Google | research.google.com |
Misc | Dapper - Distributed systems tracing infrastructure | research.google.com |
Misc | Kafka - Pub/sub message queue from LinkedIn | slideshare.net |
Misc | Zookeeper - Centralized infrastructure and services enabling synchronization | slideshare.net |
Add an architecture | Contribute |
Architectures for companies you are interviewing with.
Questions you encounter might be from the same domain.
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
Interested in adding a section or helping complete one in-progress? Contribute!
Credits and sources are provided throughout this repo.
Special thanks to:
Feel free to contact me to discuss any issues, questions, or comments.
My contact info can be found on my GitHub page.
I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).
Copyright 2017 Donne Martin
Creative Commons Attribution 4.0 International License (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
a state-of-the-art-level open visual language model | 多模态预训练模型
🔥 News: CogVLM bilingual version is available online! Welcome to try it out!
🔥 News: CogVLM中英双语版正式上线了!欢迎体验!
🔥 News: We are currently preparing to open-source a more powerful model with rich chart and document understanding capabilities. It has achieved a score of 81 on DocVQA, so stay tuned for its release!
CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.
CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images.
CogVLM can accurately describe images in details with very few hallucinations.
Click for comparison with LLAVA-1.5 and MiniGPT-4.CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details.
We support two GUIs for model inference, web demo and CLI. If you want to use it in your python code, it is easy to modify the CLI scripts for your case.
First, we need to install the dependencies.
pip install -r requirements.txt
python -m spacy download en_core_web_sm
We also offer a local web demo based on Gradio. First, install Gradio by running: pip install gradio
. Then download and enter this repository and run web_demo.py
. See the next section for detailed usage:
python web_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python web_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
The GUI of the web demo looks like:
We open-source different checkpoints for different downstreaming tasks:
cogvlm-chat
The model after SFT for alignment, which supports chat like GPT-4V.cogvlm-base-224
The original checkpoint after text-image pretraining.cogvlm-base-490
The finetuned version on 490px
resolution from cogvlm-base-224
. The finetuning data includes the training sets of VQA datasets.cogvlm-grounding-generalist
. This checkpoint supports different visual grounding tasks, e.g. REC, Grounding Captioning, etc.Run CLI demo via:
python cli_demo.py --from_pretrained cogvlm-base-224 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-base-490 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python cli_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
The program will automatically download the sat model and interact in the command line. You can generate replies by entering instructions and pressing enter. Enter clear
to clear the conversation history and stop
to stop the program.
We also support model parallel inference, which splits model to multiple (2/4/8) GPUs. --nproc-per-node=[n]
in the following command controls the number of used GPUs.
torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
Note:
--local_tokenizer /path/to/vicuna-7b-v1.5
to load the tokenizer.~/.sat_models
. Change the default location by setting the environment variable SAT_HOME
. For example, if you want to save the model to /path/to/my/models
, you can run export SAT_HOME=/path/to/my/models
before running the python command.The program provides the following hyperparameters to control the generation process:
usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english]
optional arguments:
-h, --help show this help message and exit
--max_length MAX_LENGTH
max length of the total sequence
--top_p TOP_P top p for nucleus sampling
--top_k TOP_K top k for top k sampling
--temperature TEMPERATURE
temperature for sampling
--english only output English
You may want to use CogVLM in your own task, which needs a different output style or domain knowledge. We here provide a finetuning example for Captcha Recognition.
Start by downloading the Captcha Images dataset. Once downloaded, extract the contents of the ZIP file.
To create a train/validation/test split in the ratio of 80/5/15, execute the following:
python scripts/split_dataset.py
Start the fine-tuning process with this command:
bash scripts/finetune_(224/490)_lora.sh
Merge the model to model_parallel_size=1
: (replace the 4 below with your training MP_SIZE
)
torchrun --standalone --nnodes=1 --nproc-per-node=4 merge_model.py --version base --bf16 --from_pretrained ./checkpoints/merged_lora_(224/490)
Evaluate the performance of your model.
bash scripts/evaluate_(224/490).sh
It is recommended to use the 490px
version. However, if you have limited GPU resources (such as only one node with 8* RTX 3090), you can try 224px
version with model parallel.
The anticipated result of this script is around 95%
accuracy on test set.
It is worth noting that the fine-tuning examples only tune limited parameters. (Expert only) If you want to get >98%
accuracy, you need to increase the trainable parameters in finetune_demo.py
.
The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License.
If you find our work helpful, please consider citing the following papers
@article{wang2023cogvlm,
title={CogVLM: Visual Expert for Pretrained Language Models},
author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
year={2023},
eprint={2311.03079},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.
Teaching LLMs memory management for unbounded context 📚🦙
Join Discord and message the MemGPT bot (in the #memgpt
channel). Then run the following commands (messaged to "MemGPT Bot"):
/profile
(to create your profile)/key
(to enter your OpenAI key)/create
(to create a MemGPT chatbot)Make sure your privacy settings on this server are open so that MemGPT Bot can DM you:
MemGPT → Privacy Settings → Direct Messages set to ON
You can see the full list of available commands when you enter /
into the message box.
Memory-GPT (or MemGPT in short) is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. Learn more about MemGPT in our paper.
Install MemGPT:
pip install pymemgpt
Add your OpenAI API key to your environment:
export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
set OPENAI_API_KEY=YOUR_API_KEY # on Windows
$Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)
Configure default setting for MemGPT by running:
memgpt configure
Now, you can run MemGPT with:
memgpt run
You can run the following commands in the MemGPT CLI prompt:
/exit
: Exit the CLI/attach
: Attach a loaded data source to the agent/save
: Save a checkpoint of the current agent/conversation state/dump
: View the current message log (see the contents of main context)/dump <count>
: View the last
messages (all if
is omitted)
/memory
: Print the current contents of agent memory/pop
: Undo the last message in the conversation/pop <count>
: Undo the last messages in the conversation. It defaults to 3, which usually is one turn around in the conversation/retry
: Pops the last answer and tries to get another one/rethink <text>
: Will replace the inner dialog of the last assistant message with the
to help shaping the conversation
/rewrite
: Will replace the last assistant answer with the given text to correct or force the answer/heartbeat
: Send a heartbeat system message to the agent/memorywarning
: Send a memory warning system message to the agentOnce you exit the CLI with /exit
, you can resume chatting with the same agent by specifying the agent name in memgpt run --agent <NAME>
.
See full documentation at: https://memgpt.readthedocs.io/
To install MemGPT from source, start by cloning the repo:
git clone git@github.com:cpacker/MemGPT.git
Then navigate to the main MemGPT
directory, and do:
pip install -e .
Now, you should be able to run memgpt
from the command-line using the downloaded source code.
If you are having dependency issues using pip install -e .
, we recommend you install the package using Poetry (see below). Installing MemGPT from source using Poetry will ensure that you are using exact package versions that have been tested for the production build.
First, install Poetry using the official instructions here.
Then, you can install MemGPT from source with:
git clone git@github.com:cpacker/MemGPT.git
poetry shell
poetry install
For issues and feature requests, please open a GitHub issue or message us on our #support
channel on Discord
Datasets used in our paper can be downloaded at Hugging Face.
🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).
by Dustin Miller • Reddit • Substack
License: Attribution-NonCommercial-ShareAlike 4.0 International
Elevating Conversational AI to Expert Level
I've created a set of "Custom GPTs" with updated versions of these prompts:
Want to support these free prompts? My Substack offers paid subscriptions, that's the best way to show your appreciation.
ChatGPT AutoExpert is a shockingly effective set of custom instructions aimed at enhancing the capabilities of GPT-4 and GPT-3.5-Turbo conversational models. These instructions maximize the depth and nuance in responses while minimizing general disclaimers and hand-holding. The ultimate objective is to provide users with accurate, context-rich information and an improved learning experience.
To get started with ChatGPT AutoExpert, choose which set of custom instructions you want to use:
[!IMPORTANT] This requires a ChatGPT professional subscription, as it needs both GPT-4 and Advanced Data Analysis!
/memory
slash command will download all your files, and a history of everything that's been done during your session. Simply upload it (along with the companion script) in a new session, and pick up where you left off.x86_64
architecture (as of this writing)./slash
commands, AutoExpert (Developer Edition) will save all your code snippets, dehydrate its memory of your requirements and the work it's done—even back up the code cells themselves. Then it zips it up, and you can quickly download your coding conversation history.ChatGPT AutoExpert (both standard and "Developer Edition")
by Dustin Miller is licensed under Attribution-NonCommercial-ShareAlike 4.0 International
Article URL: https://github.com/KillianLucas/open-interpreter
Comments URL: https://news.ycombinator.com/item?id=38242343
Points: 100
# Comments: 49
SQL databases in Python, designed for simplicity, compatibility, and robustness.
SQLModel, SQL databases in Python, designed for simplicity, compatibility, and robustness.
Documentation: https://sqlmodel.tiangolo.com
Source Code: https://github.com/tiangolo/sqlmodel
SQLModel is a library for interacting with SQL databases from Python code, with Python objects. It is designed to be intuitive, easy to use, highly compatible, and robust.
SQLModel is based on Python type annotations, and powered by Pydantic and SQLAlchemy.
The key features are:
SQLModel is designed to simplify interacting with SQL databases in FastAPI applications, it was created by the same author. 😁
It combines SQLAlchemy and Pydantic and tries to simplify the code you write as much as possible, allowing you to reduce the code duplication to a minimum, but while getting the best developer experience possible.
SQLModel is, in fact, a thin layer on top of Pydantic and SQLAlchemy, carefully designed to be compatible with both.
A recent and currently supported version of Python.
As SQLModel is based on Pydantic and SQLAlchemy, it requires them. They will be automatically installed when you install SQLModel.
$ pip install sqlmodel
---> 100%
Successfully installed sqlmodel
For an introduction to databases, SQL, and everything else, see the SQLModel documentation.
Here's a quick example. ✨
Imagine you have a SQL table called hero
with:
id
name
secret_name
age
And you want it to have this data:
id name secret_name age1 | Deadpond | Dive Wilson | null |
2 | Spider-Boy | Pedro Parqueador | null |
3 | Rusty-Man | Tommy Sharp | 48 |
Then you could create a SQLModel model like this:
from typing import Optional
from sqlmodel import Field, SQLModel
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
secret_name: str
age: Optional[int] = None
That class Hero
is a SQLModel model, the equivalent of a SQL table in Python code.
And each of those class attributes is equivalent to each table column.
Then you could create each row of the table as an instance of the model:
hero_1 = Hero(name="Deadpond", secret_name="Dive Wilson")
hero_2 = Hero(name="Spider-Boy", secret_name="Pedro Parqueador")
hero_3 = Hero(name="Rusty-Man", secret_name="Tommy Sharp", age=48)
This way, you can use conventional Python code with classes and instances that represent tables and rows, and that way communicate with the SQL database.
Everything is designed for you to get the best developer experience possible, with the best editor support.
Including autocompletion:
And inline errors:
You can learn a lot more about SQLModel by quickly following the tutorial, but if you need a taste right now of how to put all that together and save to the database, you can do this:
from typing import Optional
from sqlmodel import Field, Session, SQLModel, create_engine
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
secret_name: str
age: Optional[int] = None
hero_1 = Hero(name="Deadpond", secret_name="Dive Wilson")
hero_2 = Hero(name="Spider-Boy", secret_name="Pedro Parqueador")
hero_3 = Hero(name="Rusty-Man", secret_name="Tommy Sharp", age=48)
engine = create_engine("sqlite:///database.db")
SQLModel.metadata.create_all(engine)
with Session(engine) as session:
session.add(hero_1)
session.add(hero_2)
session.add(hero_3)
session.commit()
That will save a SQLite database with the 3 heroes.
Then you could write queries to select from that same database, for example with:
from typing import Optional
from sqlmodel import Field, Session, SQLModel, create_engine, select
class Hero(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
secret_name: str
age: Optional[int] = None
engine = create_engine("sqlite:///database.db")
with Session(engine) as session:
statement = select(Hero).where(Hero.name == "Spider-Boy")
hero = session.exec(statement).first()
print(hero)
SQLModel was carefully designed to give you the best developer experience and editor support, even after selecting data from the database:
That class Hero
is a SQLModel model.
But at the same time, ✨ it is a SQLAlchemy model ✨. So, you can combine it and use it with other SQLAlchemy models, or you could easily migrate applications with SQLAlchemy to SQLModel.
And at the same time, ✨ it is also a Pydantic model ✨. You can use inheritance with it to define all your data models while avoiding code duplication. That makes it very easy to use with FastAPI.
This project is licensed under the terms of the MIT license.
Odoo. Open Source Apps To Grow Your Business.
Odoo is a suite of web based open source business apps.
The main Odoo Apps include an Open Source CRM, Website Builder, eCommerce, Warehouse Management, Project Management, Billing & Accounting, Point of Sale, Human Resources, Marketing, Manufacturing, ...
Odoo Apps can be used as stand-alone applications, but they also integrate seamlessly so you get a full-featured Open Source ERP when you install several Apps.
For a standard installation please follow the Setup instructions from the documentation.
To learn the software, we recommend the Odoo eLearning, or Scale-up, the business game. Developers can start with the developer tutorials
PyTorch Tutorial for Deep Learning Researchers
This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less than 30 lines of code. Before starting this tutorial, it is recommended to finish Official Pytorch Tutorial.
$ git clone https://github.com/yunjey/pytorch-tutorial.git
$ cd pytorch-tutorial/tutorials/PATH_TO_PROJECT
$ python main.py