New Show HN: A Dalle-3 and GPT4-Vision feedback loop
1 day, 18 hours ago

I used to enjoy Translation Party, and over the weekend I realized that we can build the same feedback loop with DALLE-3 and GPT4-Vision. Start with a text prompt, let DALLE-3 generate an image, then GPT-4 Vision turns that image back into a text prompt, DALLE-3 creates another image, and so on.

You need to bring your own OpenAI API key (costs about $0.10/run)

Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results.

Here's a few of my favorites:

- Gnomes: https://dalle.party/?party=k4eeMQ6I

- Start with a sailboat but bias GPT4V to "replace everything with cats": https://dalle.party/?party=0uKfJjQn

- A more stable one (but everyone is always an actor): https://dalle.party/?party=oxpeZKh5


Comments URL: https://news.ycombinator.com/item?id=38432486

Points: 129

# Comments: 40

New Show HN: Codebasechat.com – Make a GPT chatbot for any GitHub Repo in 30 seconds
1 day, 18 hours ago

Hi HN - I’m excited to share a fun side project we built recently

CodebaseChat.com is a tool for building a GPT chatbot for any GitHub repo in 30 seconds

It can be helpful when onboarding to new codebases, when needing help understanding system design, asking for less technical explanations of functionality

We’ve been heads down building Context.ai, the analytics platform for LLM products. When OpenAI released GPTs earlier this month, we built one to answer questions about our growing codebase. It worked so well that we decided to open source the utility for other dev teams

How it works:

- Submit a GitHub repo URL at CodebaseChat.com

- We’ll give you a repo.md file

- Upload that to OpenAI’s GPT builder

- Voilà! You can ask your GPT about your codebase

Don’t have ChatGPT Plus? No worries, you can use OpenAI’s Assistants Playground completely for free.

Have an idea to improve the project? Submit a PR at https://github.com/contextco/codebasechat


Comments URL: https://news.ycombinator.com/item?id=38432195

Points: 10

# Comments: 1

Show HN: I built a guided Build your own DNS Server challenge
2 days, 1 hour ago

Hey everyone. It's Sherub here, author of the Build your own DNS Server challenge on CodeCrafters. Currently it’s available in Rust, Go, and Python and is free while in beta.

https://codecrafters.io/dns-server

I've kept the challenge accessible but still challenging for an intermediate developer. This challenge, like others from CodeCrafters, is self-paced. You can use any tools you prefer (terminal, editor, etc.) to build the project.

At the end of the challenge, you will have created a DNS forwarding server. The server can create and read DNS packets and respond to DNS queries. As you go, you'll learn about the DNS protocol, its format, servers, and A records. All while getting to hone your language skills.

Some of the challenges and choices I had to make:

* To make the stages easier, I had to break them up, such that each step builds on the previous one. This was especially challenging for the 2nd stage, where we write a DNS packet's header contents. Even though I’d have liked it to be easier, breaking it up further would have been weird.

* Instead of implementing a recursive resolver, I've restricted to a forwarding server. We made this decision so that most developers can still use it. To add more complexity, we can use a challenge extension (noted below).

* Deciding how much instruction and context the stages should provide. I’ve decided to keep them as thorough as possible for most of the stages. Developers can choose to have thorough details or just skim through them.

I would love your feedback and questions on the challenge. You can try it out for free here: https://codecrafters.io/dns-server (no CC required).

I also have challenge extensions planned. You can find them at https://app.codecrafters.io/vote/challenge-extension-ideas?c.... I'm also keen to hear what you think about the extension ideas.


Comments URL: https://news.ycombinator.com/item?id=38428957

Points: 19

# Comments: 4

Show HN: I built a domain name marketplace for folks (like me) who hoard domains
2 days, 10 hours ago

Problem: I (accidentally) hoard domains :')

1. I get excited about a new project 2. I buy a domain 3. I get busy, and the domain collects a thick layer of dust

I know I'm not alone in this, either

So, I had the idea of creating a simple and casual marketplace for folks like me to list their domains at a fair price with a nice community feel to free up these caged domains

It felt like a great project for me to pick up some new skills, so I got to it

All up, it took me about a month, and I built the whole thing live on Twitch

I've always sat on the design, marketing and front-end side of the fence, so this was my first attempt at making a 'full' web app

Here's the stack I used:

- SvelteKit (https://kit.svelte.dev/)

- Supabase (https://supabase.com/)

- Resend (https://resend.com/)

- ShadCN Svelte (https://www.shadcn-svelte.com/)

It was super fun to build, and as a beginner, I learnt so much

I leaned on AI quite heavily to help advance my speed of groking certain concepts within both SvelteKit & Supabase, and I blogged about the experience and my learnings here: https://aroreretini.dev/projects/dwarf/

Any feedback/criticism very much welcome, I've got a lot to learn :)


Comments URL: https://news.ycombinator.com/item?id=38425677

Points: 11

# Comments: 6

Software you are thankful for
5 days, 19 hours ago

I asked this question at the orange site and wanted to hear you guys too :-)

Amidst all the software enshittification that we are seeing every day, what software are you thankful for? That makes your life better? I’ll start

  • Linux kernel (duh!)

  • Void Linux (cured my distro hooping!)

  • Kicad (the only software my wife has seen me use and say “that looks expensive!”)

  • Inkscape

  • Sublime Text

  • gcc and clang

  • fish (the shell)

  • KDE

  • Things 3

  • Miniflux

  • iTerm2

  • brew.sh

  • GoLand by Jetbrains

How about you guys?

There are no strings on me
6 days, 7 hours ago
Build Conway's Game of Life With Python
6 days, 17 hours ago
In this step-by-step project, you'll implement Conway's Game of Life in Python. To make the game usable, you'll create a user-friendly command-line interface (CLI) with several options that will allow you to run the game using different life patterns and configurations.
Python Basics Exercises: Modules and Packages
1 week ago
In this video course, you'll practice separating your code into modules, using the import statement to access another module's namespace, and creating Python packages.
How to Render Markdown in a Django Application
1 week, 1 day ago
In this tutorial, you'll learn how to create and render web content in Django with the simplicity and flexibility of the Markdown text formatting language.
The Real Python Podcast – Episode #181: Computational Thinking & Learning Python During an AI Revolution
1 week, 4 days ago
Has the current growth of artificial intelligence (AI) systems made you wonder what the future holds for Python developers? What are the hidden benefits of learning to program in Python and practicing computational thinking? This week on the show, we speak with author Lawrence Gray about his upcoming book "Mastering Python: A Problem Solving Approach."
A Very Subtle Bug
1 week, 6 days ago
What's new in TensorFlow 2.15
1 week, 4 days ago
Posted by the TensorFlow team

TensorFlow 2.15 has been released! Highlights of this release (and 2.14) include a much simpler installation method for NVIDIA CUDA libraries for Linux, oneDNN CPU performance optimizations for Windows x64 and x86, full availability of tf.function types, an upgrade to Clang 17.0.1, and much more! For the full release note, please check here.

Note: Release updates on the new multi-backend Keras will be published on keras.io starting with Keras 3.0. For more information, please check here.

TensorFlow Core

NVIDIA CUDA libraries for Linux

The tensorflow pip package has a new, optional installation method for Linux that installs necessary NVIDIA CUDA libraries through pip. As long as the NVIDIA driver is already installed on the system, you may now run pip install tensorflow[and-cuda] to install TensorFlow's NVIDIA CUDA library dependencies in the Python environment. Aside from the NVIDIA driver, no other pre-existing NVIDIA CUDA packages are necessary. In TensorFlow 2.15, CUDA has been upgraded to version 12.2.

oneDNN CPU performance optimizations

For Windows x64 & x86 packages, oneDNN optimizations are now enabled by default on X86 CPUs. These optimizations can be enabled or disabled by setting the environment variable TF_ENABLE_ONEDNN_OPTS to 1 (enable) or 0 (disable) before running TensorFlow. To fall back to default settings, simply unset the environment variable.

tf.function

tf.function types are now fully available.

  • tf.types.experimental.TraceType now allows custom tf.function inputs to declare Tensor decomposition and type casting support. 
  • Introducing tf.types.experimental.FunctionType as the comprehensive representation of the signature of tf.function callables. It can be accessed through the function_type property of tf.function’s and ConcreteFunctions. See the tf.types.experimental.FunctionType documentation for more details. 
  • Introducing tf.types.experimental.AtomicFunction as the fastest way to perform TF computations in Python. This capability can be accessed through the inference_fn property of ConcreteFunctions. (Does not support gradients.) See the tf.types.experimental.AtomicFunction documentation for how to call and use it.

Upgrade to Clang 17.0.1 and CUDA 12.2

TensorFlow PIP packages are now being built with Clang 17 and CUDA 12.2 to improve performance for NVIDIA Hopper-based GPUs. Moving forward, Clang 17 will be the default C++ compiler for TensorFlow. We recommend upgrading your compiler to Clang 17 when building TensorFlow from source.

Join us at the third Women in ML Symposium!
1 week, 4 days ago
Posted by Sharbani Roy – Senior Director, Product Management, Google


We're back with the third annual Women in Machine Learning Symposium on December 7, 2023! Join us virtually from 9:30 am to 1:00 pm PT for an immersive and insightful set of deep dives for every level of Machine Learning experience.

The Women in ML Symposium is an inclusive event for anyone passionate about the transformative fields of Machine Learning (ML) and Artificial Intelligence (AI). Dive into the latest advancements in generative AI, explore the intricacies of privacy-preserving AI, dig into the underlying accelerators and ML frameworks that power models, and uncover practical applications of ML across multiple industries.

Our event offers sessions for all expertise levels, from beginners to advanced practitioners. Hear about what’s new in ML and building with Google AI from our keynote speakers, gain insights from seasoned industry leaders across Google Health, Nvidia, Adobe, and more – and discover a wealth of knowledge on topics ranging from foundational AI concepts to open source tools, techniques, and beyond.

RSVP today to secure your spot and explore our exciting agenda. We can't wait to see you there!

Oops! We Automated Bullshit
1 week, 4 days ago
Tiny LLMs
1 week, 4 days ago
Django 5.0 string freeze is in effect, translations needed!
1 week ago

Hello everyone!

Django 5.0 RC1 was released yesterday, establishing the string freeze for the 5.0 release. This means that strings marked for translations will not change between now and the 5.0 final release, scheduled for December 4th.

It would be extremely helpful if you could ensure that the Django translations for the languages you collaborate with are complete on Transifex . I’ll be fetching the available translations on Friday, December 1st, in preparation for the release of the following Monday.

For more information about Django translations, refer to this link.

Thank you very much for your help!
Cheers, Natalia.

1 post - 1 participant

Read full topic

Challenges Encountered in Implementing Conditional WHERE Feature in `bulk_create` for Django Ticket #34277
1 week, 1 day ago

Hello everyone,

We are currently working on adding a feature for Django ticket #34277. The goal is to introduce a conditional WHERE clause in the bulk_create method to enable conditional updates when using the bulk_create function with update_conflicts=True.

What we have accomplished so far:

  1. Modifying bulk_create: We added additional parameters to bulk_create, including update_conflicts, update_fields, unique_fields, and where_clause.
  2. Passing Parameters: We have successfully passed these parameters through various functions, including _insert, _batched_insert, and others relevant to the process.
  3. Handling SQL: We started working on converting the WHERE clause (represented by a Q object) into valid SQL, using a SQL compiler obtained via get_compiler().

Encountered Issue:
When attempting to compile the Q object (representing the WHERE clause) into SQL, we are facing an error: 'Q' object has no attribute 'as_sql'. We tried to bypass this issue by using compiler.compile(where_clause), but this did not lead to a viable solution.

Questions and Need for Clarifications:

  1. Converting Q to WhereNode: How can we properly convert a Q object into a WhereNode object so it can be compiled into SQL? Is there a standard approach for this within Django’s framework, or should we consider a custom method?
  2. Best Practices: Are there any best practices or existing examples in Django’s codebase that we could refer to for this task?
  3. Handling Specific Backends: How can we effectively manage differences between SQL backends (like SQLite, PostgreSQL, etc.) when generating SQL for the conditional WHERE clause?
  4. General Advice: Are there any tips or cautions you could give us to successfully complete this task, considering its complexity and impact on a widely used feature like bulk_create?

We would greatly appreciate any feedback, advice, or code examples that could help us progress in this complex task. We aim to make a significant contribution to Django while ensuring the robustness and compatibility of this new feature.

Thank you in advance for your time and expertise.

Best regards,
Barhamou

# myapp/views.py
from django.http import HttpResponse
from .models import Item
from django.utils import timezone
from datetime import timedelta
from django.db.models import F, Q

def test_bulk_create_view(request):
    new_items = [
        Item(id=1, name="Item 1 encore une modification", last_updated=timezone.now() - timedelta(days=1)),
        Item(id=2, name="Item 2 Updated", last_updated=timezone.now() - timedelta(days=1)),
        Item(id=3, name="Item 3 New")
    ]

    Item.objects.bulk_create(
        new_items,
        update_conflicts=True,
        update_fields=['name', 'last_updated'],  # Champs à mettre à jour
        unique_fields=['id'],  # Champ unique susceptible de déclencher l'upsert
        where_clause=Q(last_updated__lt=F('EXCLUDED__last_updated'))
    )


    #Item.objects.bulk_create(new_items, where_clause='Une clause where')
    #Item.objects.bulk_create(new_items, update_conflicts=True)
    return HttpResponse("bulk_create testé avec succès")

1 post - 1 participant

Read full topic

Feedback needed: accessibility guidelines for contributors
1 week, 3 days ago

Django’s accessibility team is looking for feedback on proposed guidelines for contributors: Accessibility guidelines for all contributors #17338 (read the guidelines as HTML).

Django’s accessibility currently leaves a lot to be desired. It’s crucial for us to introduce those guidelines, and crucial for the guidelines to be strict enough to allow us to improve, while not being an unfair burden for contributors working on UI changes, which is challenging enough as it is. We need feedback from people with different perspectives to make sure this will work for everyone.

How you can help

Two ways!

  1. Read the guidelines on the preview page or on pull request #17338. Provide your feedback here or in GitHub – both “general sentiment” comments and specific issues / possible improvements are very welcome.
  2. Share this forum thread or my tweet about this with your colleagues, friends, enemies so they provide feedback as well. Have them sign up to the forum to reply here, or provide their feedback on the PR.

We’re looking for feedback from all contributors, in particular new contributors and people who could see themselves contributing in the future. We also want feedback from @steering_council members, who have a formal mandate to oversee the quality of Django.

Why we need those guidelines

Lots of reasons. Formally, the introduction of those guidelines is as per DEP-11. Excerpt from the team’s responsibilities:

  • Deciding on any relevant accessibility guidelines to follow, such as WCAG, and at which conformance level. […]
  • Coordinating […] the improvement of the accessibility in general in Django and associated projects. […]
  • Writing and maintaining documentation relating to accessibility, such as a statement of commitment to accessibility issues, and contribution guidelines.

1 post - 1 participant

Read full topic

Add ability to define custom column names in the table created by the ManyToManyField
1 week, 4 days ago

If your company has strict database design guidelines, it is an extra burden to adhere to them where the ManyToManyField() is concerned because you must define a through model in order to have custom column names in the through table.

You can define a custom table name with the db_table argument. However, there is no way to define what the column names will be in that table unless you go to the trouble of defining a custom through Model, which adds extra code and somewhat defeats the purpose of the ManyToManyField(), especially, when you don’t need additional fields in your through table.

This also breaks the convenient .set(), .add(), .create() methods on the ManyToManyField() instance when it shouldn’t break because you haven’t added extra fields on the through table.

In the following example, Django’s default name for the foreign key in the through table would be shoppingcart_id, however, this doesn’t work if your company’s style guide wants the name to be shopping_cart_id.

One solution would be to support custom column names with arguments like from_column_name and to_column_name in the following example.

class Item(models.Model):
    name = models.CharField(max_length=255)

    class Meta:
        db_table = "item"


class ShoppingCart(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    items = models.ManyToManyField(Item, from_column_name="shopping_cart", to_column_name="item")

    class Meta:
        db_table = "shopping_cart"

Another idea would be to use the existing ManyToManyField.through_fields argument for defining custom column names. For example:

items = models.ManyToManyField(Item, through_fields=("shopping_cart", "item"))

Another idea would be to have a setting in a project that defaults to breaking up words with underscores in the through table column names, however, this wouldn’t allow for as much flexibility and wouldn’t work for an existing project.

Instead, what you have to do now is create a through model like the following:

class Item(models.Model):
    name = models.CharField(max_length=255)

    class Meta:
        db_table = "item"


class ShoppingCart(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    items = models.ManyToManyField(Item, through="ShoppingCartItem")

    class Meta:
        db_table = "shopping_cart"


class ShoppingCartItem(models.Model):
    shopping_cart = models.ForeignKey(ShoppingCart, on_delete=models.CASCADE)
    item = models.ForeignKey(Item, on_delete=models.CASCADE)

    class Meta:
        db_table = "shopping_cart_item"

I would be happy to take a stab at creating a PR if others familiar with the ORM would point me in the right direction and if we could get a design decision on what method to go with.

1 post - 1 participant

Read full topic

Handling test client `data` / `query_params`
1 week, 4 days ago

See Fixed #14611 -- Added query_params to TestClient and RequestFactory. by knyghty · Pull Request #17447 · django/django · GitHub

While adding a query_params parameter to test client and request factory methods, I hit the issue that for GET and HEAD requests we can already pass in a data parameter to pass query parameters. The data parameter is also used for form data in other request methods. Everyone (so far) agrees we shouldn’t have both for these methods.

I suggested deprecating the data parameter for GET and HEAD requests. @felixxm instead prefers not adding query_params to them and sticking with data.

I am going to assume that the reason for this is for backwards compatibility, but I’ll let him tell us if there are some other issues.

But the reasons I prefer the deprecation:

  • All methods will have the same API, I think this is less confusing for beginners (and everyone, honestly).
  • It’s also easier to use them programmatically. if you need to make lots of different types of requests with query parameters to a URL, you can iterate over the method names and call them all with the same parameters.
  • I think we can be a bit more lenient on deprecating things in tests, because we shouldn’t be breaking anyone’s site with this change, only their tests.
  • This kind of change could be done automatically with django-upgrade (though I’m not sure exactly how it works so do correct me if I’m wrong).
  • It feels to me like the code would be easier to read and maintain.

5 posts - 5 participants

Read full topic

Wish to join Luganda Team
1 week, 4 days ago

Hello,

This Jamil from Uganda, I wish someone concerned for the Luganda translation issue to let me in, I have sent my request.
Or, whoever concerned can let me through the process of becoming the coordinator because the language is as down as 3%.
Or is there a way for me to contact the coordinator?

2 posts - 2 participants

Read full topic

Ticket 25504 - multiple databases with no default
1 week, 6 days ago

I made a comment on that issue, but there may be more responses here. Basically, the Django documentation seems to say that the “default” database is both required and not required, depending on where you look. Seems like at least one of the places in the docs should be changed.

Any thoughts or suggestions?

2 posts - 2 participants

Read full topic

Article: "Beginners should use Django, not Flask"
1 week, 6 days ago
bitecode.dev

Beginners should use Django, not Flask

Except you, because you are special

1 post - 1 participant

Read full topic

Ticket 24306 - postgresql unlogged tables
2 weeks ago

I posted a comment on ticket 24306 about possibly using unlogged tables for the database cache. Would there be any interest in adding that functionality to the createcachetable command?

5 posts - 3 participants

Read full topic

Persistant connections doesn't work properly after migrating to Django 4.2
2 weeks ago

Hi,
after migrating from django 3.2 to 4.2 I’ve noticed persistant connections doesn’t work the same for the new version.
I’m using CONN_MAX_AGE=20 setting

For Django 3.2 I can see that oracle doesn’t create new sessions in v$session table on each request.
But in Django 4.2 there is new session created for each request even though we haven’t changed anything in the database connections.

STACK:

  • python 3.9.16
  • django 4.2.5
  • oracle 19.16.0

Any ideas how to recreate the old behaviour on the new version of Django?

2 posts - 2 participants

Read full topic

atomicals/atomicals-js
1 week, 3 days ago

Atomicals CLI and Javascript Library


Atomicals Javascript Library

atomicals.xyz Documentation: https://docs.atomicals.xyz

Install, Build and Run Tests

Install

Download the github repo and then run:

npm install
npm run build

See all commands at:

npm run cli --help

Quick Start - Command Line (CLI)

First install packages and build, then follow the steps here to create your first Atomical and query the status. Use yarn clito get a list of all commands available.

0. Environment File (.env)

The environment file comes with defaults (.env.example), but it is highly recommend to install and operate your own ElectrumX server. Web browser communication is possible through the wss (secure websockets) interface of ElectrumX.

ELECTRUMX_WSS=wss://electrumx.atomicals.xyz:50012

// Optional (defaults to wallet.json)
WALLET_PATH=path-to-wallet.json

ELECTRUMX_WSS: URL of the ElectrumX with Atomicals support. Note that only wss endpoints are accessible from web browsers.

1. Wallet Setup

The purpose of the wallet is to create p2tr (pay-to-taproot) spend scripts and to receive change from the transactions made for the various operations. Do not put more funds than you can afford to lose, as this is still beta!

To initialize a new wallet.json file that will store your address for receiving change use the wallet-init command. Alternatively, you may populate the wallet.json manually, ensuring that the address at m/44'/0'/0'/0/0 is equal to the address and the derivePath is set correctly.

Configure the path in the environment .env file to point to your wallet file. defaults to ./wallet.json

Default:

WALLET_PATH=.
WALLET_FILE=wallet.json

Update to wallets/ directory:

WALLET_PATH=./wallets
WALLET_FILE=wallet.json

Create the wallet:

yarn cli wallet-init

>>>

Wallet created at wallet.json
phrase: maple maple maple maple maple maple maple maple maple maple maple maple
Legacy address (for change): 1FXL2CJ9nAC...u3e9Evdsa2pKrPhkag
Derive Path: m/44'/0'/0'/0/0
WIF: L5Sa65gNR6QsBjqK.....r6o4YzcqNRnJ1p4a6GPxqQQ
------------------------------------------------------

2. Explore the CLI

yarn cli --help

3. Quick Commands

Get all of the commands available:

npm run cli --help

Read the documentation at https://docs.atomicals.xyz

ElectrumX Server RPC Interface

See updated ElectrumX (https://github.com/atomicals/electrumx-atomicals)

Any questions or ideas?

https://atomicals.xyz

https://x.com/atomicalsxyz (X - Formerly Twitter)

Donate to Atomicals Development

We greatly appreciate any donation to help support Atomicals Protocol development. We worked out of passion and kindness for the world, we believe this technology must exist and be free for all to use. Bitcoin is our one hope for freedom and digital sovereignty and we intend to do our best to make it a reality.

BTC: bc1pljy9g0ugrgumpd5y6v9tv23rvz5y8dhaq980r9qfgyhd4dmgkwmqpdpr5q

OpenBMB/XAgent
1 week, 4 days ago

An Autonomous LLM Agent for Complex Task Solving


XAgent

English中文日本語

TutorialDemoBlogDocumentationCitation

📖 Introduction

XAgent is an open-source experimental Large Language Model (LLM) driven autonomous agent that can automatically solve various tasks. It is designed to be a general-purpose agent that can be applied to a wide range of tasks. XAgent is still in its early stages, and we are working hard to improve it.

🏆 Our goal is to create a super-intelligent agent that can solve any given task!

We welcome diverse forms of collaborations, including full-time and part-time roles and more. If you are interested in the frontiers of agents and want to join us in realizing true autonomous agents, please contact us at xagentteam@gmail.com.


Overview of XAgent.

XAgent

XAgent is designed with the following features:

  • Autonomy: XAgent can automatically solve various tasks without human participation.
  • Safety: XAgent is designed to run safely. All actions are constrained inside a docker container. Run it anyway!
  • Extensibility: XAgent is designed to be extensible. You can easily add new tools to enhance agent's abilities and even new agents!
  • GUI: XAgent provides a friendly GUI for users to interact with the agent. You can also use the command line interface to interact with the agent.
  • Cooperation with Human: XAgent can collaborate with you to tackle tasks. It not only has the capability to follow your guidance in solving complex tasks on the go but it can also seek your assistance when it encounters challenges.

XAgent is composed of three parts:

  • 🤖 Dispatcher is responsible for dynamically instantiating and dispatching tasks to different agents. It allows us to add new agents and improve the agents' abilities.
  • 🧐 Planner is responsible for generating and rectifying plans for tasks. It divides tasks into subtasks and generates milestones for them, allowing agents to solve tasks step by step.
  • 🦾 Actor is responsible for conducting actions to achieve goals and finish subtasks. The actor utilizes various tools to solve subtasks, and it can also collaborate with humans to solve tasks.

Workflow of XAgent.

🧰 ToolServer

ToolServer is the server that provides XAgent with powerful and safe tools to solve tasks. It is a docker container that provides a safe environment for XAgent to run. Currently, ToolServer provides the following tools:

  • 📝 File Editor provides a text editing tool to write, read, and modify files.
  • 📘 Python Notebook provides an interactive Python notebook that can run Python code to validate ideas, draw figures, etc.
  • 🌏 Web Browser provides a web browser to search and visit webpages.
  • 🖥️ Shell provides a bash shell tool that can execute any shell commands, even install programs and host services.
  • 🧩 Rapid API provides a tool to retrieve APIs from Rapid API and call them, which offers a wide range of APIs for XAgent to use. See ToolBench to get more information about the Rapid API collections. You can also easily add new tools to ToolServer to enhance XAgent's abilities.

✨ Quickstart

🛠️ Build and Setup ToolServer

ToolServer is where XAgent's action takes place. It is a docker container that provides a safe environment for XAgent to run. So you should install docker and docker-compose first. After that, you should build the docker image for ToolServer and start the docker container.

docker-compose up --build

This will build the image for the ToolServer and start the ToolServer's container. If you want to run the container in the background, please use docker-compose up -d --build. Refer here for detailed information about our ToolServer.

If the ToolServer is updated, you have to rebuild the images:

docker compose build

🎮 Setup and Run XAgent

After setting up ToolServer, you can start to run XAgent.

  • Install requirements (Require Python >= 3.10)
pip install -r requirements.txt
  • Configure XAgent
  1. You should configure XAgent in assets/config.yml before running it.
  2. At least one OpenAI key is provided in assets/config.yml, which is used to access OpenAI API. We highly recommend using gpt-4-32k to run XAgent; gpt-4 is also OK for most simple tasks. In any case, at least one gpt-3.5-turbo-16k API key should be provided as a backup model. We do not test or recommend using gpt-3.5-turbo to run XAgent due to minimal context length; you should not try to run XAgent on that.
  3. If you want to change the config_file path for XAgentServer, you should modify the CONFIG_FILE value in .env file and restart the docker container.
  • Run XAgent
python run.py --task "put your task here" --model "gpt-4" --config_file "assets/config.yml"
  1. You can use the argument --upload_files to select the initial files you want to submit to XAgent.

  2. The local workspace for your XAgent is in local_workspace, where you can find all the files generated by XAgent throughout the running process.

  3. After execution, the entire workspace in ToolServerNode will be copied to running_records for your convenience.

  4. Besides, in running_records, you can find all the intermediate steps information, e.g., task statuses, LLM's input-output pairs, used tools, etc.

  5. You can load from a record to reproduce a former run, just by setting record_dir in config(default to Null). The record is a system-level recording tied to the code version of XAgent. All running-config、query、code execution statuses (including errors)、server behavior will be documented.

  6. We have removed all sensitive information (including API keys) from the record so you can safely share it with others. In the near future, we will introduce more granular sharing options highlighting the contributions of humans during execution.

  • Run XAgent with GUI
## We ran the web ui docker when building the ToolServer network
## run nginx in docker
docker exec XAgent-Server systemctl start nginx

Build the docker image for XAgent-Server and start the docker container. You will see the XAgent Server listening on port 8090. You could visit http://localhost:5173 to interact with XAgent by using web UI. Refer here for the detailed information about our GUI Demo.

🎬 Demo

Here, we also show some cases of solving tasks by XAgent: You can check our live demo on XAgent Official Website. We also provide a video demo and showcases of using XAgent here:

Case 1. Data Analysis: Demonstrating the Effectiveness of Dual-Loop Mechanism

We start with a case of aiding users in intricate data analysis. Here, our user submitted an iris.zip file to XAgent, seeking assistance in data analysis. XAgent swiftly broke down the task into four sub-tasks: (1) data inspection and comprehension, (2) verification of the system's Python environment for relevant data analysis libraries, (3) crafting data analysis code for data processing and analysis, and (4) compiling an analytical report based on the Python code's execution results. Here is a figure drawn by XAgent.

Case 2. Recommendation: A New Paradigm of Human-Agent Interaction

Empowered with the unique capability to actively seek human assistance and collaborate in problem-solving, XAgent continues to redefine the boundaries of human-agent cooperation. As depicted in the screenshot below, a user sought XAgent's aid in recommending some great restaurants for a friendly gathering yet failed to provide specific details. Recognizing the insufficiency of the provided information, XAgent employed the AskForHumanHelp tool, prompting human intervention to elicit the user's preferred location, budget constraints, culinary preferences, and dietary restrictions. Armed with this valuable feedback, XAgent seamlessly generated tailored restaurant recommendations, ensuring a personalized and satisfying experience for the user and their friends.

Case 3. Training Model: A Sophisticated Tool User

XAgent not only tackles mundane tasks but also serves as an invaluable aid in complex tasks such as model training. Here, we show a scenario where a user desires to analyze movie reviews and evaluate the public sentiment surrounding particular films. In response, XAgent promptly initiates the process by downloading the IMDB dataset to train a cutting-edge BERT model (see screenshot below), harnessing the power of deep learning. Armed with this trained BERT model, XAgent seamlessly navigates the intricate nuances of movie reviews, offering insightful predictions regarding the public's perception of various films.

📊 Evaluation

We conduct human preference evaluation to evaluate XAgent's performance. We prepare over 50 real-world complex tasks for assessment, which can be categorized into 5 classes: Search and Report, Coding and Developing, Data Analysis, Math, and Life Assistant. We compare the results of XAgent with AutoGPT, which shows a total win of XAgent over AutoGPT. All running records can refer to here.

We report a significant improvement of XAgent over AutoGPT in terms of human preference.

We also evaluate XAgent on the following benchmarks:

🖌️ Blog

Our blog is available at here!

🌟 Our Contributors

A heartfelt thank you to all our contributors. Your efforts make this project grow and thrive. Every contribution, big or small, is invaluable.

🌟 Star History

Citation

If you find our repo useful, please kindly consider citing:

@misc{xagent2023,
      title={XAgent: An Autonomous Agent for Complex Task Solving}, 
      author={XAgent Team},
      year={2023},
}
lobehub/lobe-chat
1 week, 4 days ago

🤖 Lobe Chat - an open-source, vision supported, extensible, high-performance chat client. It supports one-click free deployment of your private ChatGPT/LLM web application.


Lobe Chat

LobeChat is an open-source, extensible (Function Calling) high-performance chatbot framework.
It supports one-click free deployment of your private ChatGPT/LLM web application.

English · 简体中文 · Changelog · Wiki · Report Bug · Request Feature



Share LobeChat Repository

Pioneering the new age of thinking and creating. Built for you, the Super Individual.

Table of contents

TOC


👋🏻 Getting Started & Join Our Community

Please be aware that LobeChat is currently under active development, and feedback is welcome for any issues encountered.

Join our Discord community! This is where you can connect with developers and other enthusiastic users of LobeHub.

[!IMPORTANT]

Star Us, You will receive all release notifications from GitHub without any delay ~ ⭐️

Star History

✨ Features

  • 💎 Exquisite UI Design: With a carefully designed interface, it offers an elegant appearance and smooth interaction. It supports light and dark themes and is mobile-friendly. PWA support provides a more native-like experience.
  • 🗣️ Smooth Conversation Experience: Fluid responses ensure a smooth conversation experience. It fully supports Markdown rendering, including code highlighting, LaTex formulas, Mermaid flowcharts, and more.
  • 🤖 Customizable Agent Roles: Users can create, share, and debug personalized dialogue agent roles according to their needs, providing more flexible and customized dialogue functions.
  • 🧩 Plugin Support & Custom Plugin Development: Conversations are extendable with plugins. Users can install and use various plugins, such as search engines, web extraction, etc. It also supports the development of custom plugins to meet custom needs.
  • 🏬 Agent Market: A Agent Market is provided where users can select their preferred dialogue agent roles, enriching the content and style of the dialogue.
  • 👁️ Visual Recognition: With the integration of visual recognition capabilities, your agent can now analyze and understand images provided during the conversation. This allows for more interactive and context-aware conversations, enabling the dialogue agent to provide relevant and accurate responses based on visual content.
  • (WIP)📢 Text-to-Speech (TTS) Conversation: LobeChat are supporting Text-to-Speech technology, allowing users to have voice-based conversations with the dialogue agent. This feature enhances the user experience by providing a more natural and immersive conversation environment. Users can choose from a variety of voices and adjust the speech rate to suit their preferences.

[!NOTE]

You can find our upcoming Roadmap plans in the Projects section.


Beside these features, LobeChat also have much better basic technique underground:

  • 💨 Quick Deployment: Using the Vercel platform or docker image, you can deploy with just one click and complete the process within 1 minute without any complex configuration.
  • 🌐 Custom Domain: If users have their own domain, they can bind it to the platform for quick access to the dialogue agent from anywhere.
  • 🔒 Privacy Protection: All data is stored locally in the user's browser, ensuring user privacy.

📸 Snapshot

1 Function Calling Plugin System

By establishing a versatile plugin system, ChatGPT becomes capable of delivering real-time news updates and enhancing your ability to interact with documents and e-commerce data more effectively. This extended functionality positions ChatGPT as a valuable resource across diverse domains. If you have an interest in creating plugins, we offer comprehensive component development documentation, software development kits (SDKs), and pre-made templates in the 🧩 Plugin System section below. Join us in our collective efforts to empower ChatGPT, making it both more potent and user-friendly.


2 Prompt Agent Market

In our agent market. We have accumulated a large number of practical, prompt agents that have been used in daily work and study. You can also share your agents here and iterate and optimize your prompt agents with more people. You can submit your agents through 🤖/🏪 Submit Agents, and our automated i18n workflow will automatically translate your agents into multiple languages, allowing users worldwide to enjoy your wisdom.

Recent Submits Description
Expert Agent Mentor
By tcmonster on 2023-11-16
Call on expert agents perfectly suited for the task to support your goals
task-guidance execution-plan communication support
Full-stack Developer
By cloverfield11 on 2023-11-15
Full-stack web developer with experience in HTML, CSS, JavaScript, Python, Java, Ruby, and frameworks such as React, Angular, Vue.js, Express, Django, Next.js, Flask, or Ruby on Rails. Experience in databases, application architecture, security, and testing.
web-development front-end back-end programming databases
Graphic Creative Master
By yingxirz on 2023-11-15
Specializes in graphic creative design and graphic creativity
graphic creative design graphic-design
Expert Agent Mentor
By tcmonster on 2023-11-15
Call on expert agents perfectly suited for the task to support your goals
task-guidance execution-plan communication support

📊 Total agents: 48


3 Progress Web App

Utilize the Progressive Web Application (PWA) technology to achieve a seamless LobeChat experience on your computer or mobile device.

[!NOTE]

If you are unfamiliar with the installation process of PWA, you can add LobeChat as your desktop application (also applicable to mobile devices) by following these steps:

  • Launch the Chrome or Edge browser on your computer.
  • Visit the LobeChat webpage.
  • In the upper right corner of the address bar, click on the Install icon.
  • Follow the instructions on the screen to complete the PWA Installation.

4 Theme Mode Selection

LobeChat offers two unique theme modes - Light Mode and Dark Mode, as well as rich color customization options to meet your personalized needs. By default, our themes will intelligently switch based on your system settings, but if you prefer manual control, you can easily switch in the settings.

5 Mobile Device Adaptation

We have carried out a series of optimization designs for mobile devices to enhance the user's mobile experience. Currently, we are iterating on the mobile user experience to achieve smoother and more intuitive interactions. If you have any suggestions or ideas, we welcome you to provide feedback through GitHub Issues or Pull Requests.

🚧 Additional snapshots and demonstrations are being progressively added...

⚡️ Performance

[!NOTE]

The complete list of reports can be found in the 📘 Lighthouse Reports

Desktop Mobile
📑 Lighthouse Report 📑 Lighthouse Report

🛳 Self Hosting

LobeChat provides Self-Hosted Version with Vercel and Docker Image. This allows you to deploy your own chatbot within a few minutes without any prior knowledge.

A Deploying with Vercel

If you want to deploy this service yourself on Vercel, you can follow these steps:

  • Prepare your OpenAI API Key.
  • Click the button below to start deployment: Deploy with Vercel. Log in directly with your GitHub account, and remember to fill in the OPENAI_API_KEY(required) and ACCESS_CODE (recommended) on the environment variable section.
  • After deployment, you can start using it.
  • Bind a custom domain (optional): The DNS of the domain assigned by Vercel is polluted in some areas; binding a custom domain can connect directly.

Keep Updated

If you have deployed your own project following the one-click deployment steps in the README, you might encounter constant prompts indicating "updates available." This is because Vercel defaults to creating a new project instead of forking this one, resulting in an inability to detect updates accurately.

[!TIP]

We suggest you redeploy using the following steps, 📘 Maintaining Updates with LobeChat Self-Deployment.


B Deploying with Docker

We provide a Docker image for deploying the LobeChat service on your own private device. Use the following command to start the LobeChat service:

$ docker run -d -p 3210:3210 \
  -e OPENAI_API_KEY=sk-xxxx \
  -e ACCESS_CODE=lobe66 \
  lobehub/lobe-chat

[!TIP]

If you need to use the OpenAI service through a proxy, you can configure the proxy address using the OPENAI_PROXY_URL environment variable:

$ docker run -d -p 3210:3210 \
  -e OPENAI_API_KEY=sk-xxxx \
  -e OPENAI_PROXY_URL=https://api-proxy.com/v1 \
  -e ACCESS_CODE=lobe66 \
  lobehub/lobe-chat

[!NOTE]

For detailed instructions on deploying with Docker, please refer to the 📘 Docker Deployment Guide


Environment Variable

This project provides some additional configuration items set with environment variables:

Environment Variable Required Description Example
OPENAI_API_KEY Yes This is the API key you apply on the OpenAI account page sk-xxxxxx...xxxxxx
OPENAI_PROXY_URL No If you manually configure the OpenAI interface proxy, you can use this configuration item to override the default OpenAI API request base URL https://api.chatanywhere.cn/v1
The default value is
https://api.openai.com/v1
ACCESS_CODE No Add a password to access this service; the password should be a 6-digit number or letter awCT74 or e3@09!

[!NOTE]

The complete list of environment variables can be found in the 📘 Environment Variables

📦 Ecosystem

NPM Repository Description Version
@lobehub/ui lobehub/lobe-ui Lobe UI is an open-source UI component library dedicated to building AIGC web applications.
@lobehub/lint lobehub/lobe-lint LobeLint provides configurations for ESlint, Stylelint, Commitlint, Prettier, Remark, and Semantic Release for LobeHub.
@lobehub/assets lobehub/assets Logo assets, favicons, webfonts for LobeHub.

🧩 Plugins

Plugins provide a means to extend the Function Calling capabilities of LobeChat. They can be used to introduce new function calls and even new ways to render message results. If you are interested in plugin development, please refer to our 📘 Plugin Development Guide in the Wiki.

  • lobe-chat-plugins: This is the plugin index for LobeChat. It accesses index.json from this repository to display a list of available plugins for LobeChat to the user.
  • chat-plugin-template: This is the plugin template for LobeChat plugin development.
  • @lobehub/chat-plugin-sdk: The LobeChat Plugin SDK assists you in creating exceptional chat plugins for Lobe Chat.
  • @lobehub/chat-plugins-gateway: The LobeChat Plugins Gateway is a backend service that provides a gateway for LobeChat plugins. We deploy this service using Vercel. The primary API POST /api/v1/runner is deployed as an Edge Function.

[!NOTE]

The plugin system is currently undergoing major development. You can learn more in the following issues:

  • [x] Plugin Phase 1: Implement separation of the plugin from the main body, split the plugin into an independent repository for maintenance, and realize dynamic loading of the plugin.
  • [x] Plugin Phase 2: The security and stability of the plugin's use, more accurately presenting abnormal states, the maintainability of the plugin architecture, and developer-friendly.
  • [ ] Plugin Phase 3: Higher-level and more comprehensive customization capabilities, support for plugin authentication, and examples.
Official Plugin Repository Description
Clock Time
By LobeHub on 2023-11-01
lobehub/chat-plugin-clock-time Display a clock to show current time
clock time
Website Crawler
By LobeHub on 2023-08-17
lobehub/chat-plugin-web-crawler Extract content from web links
web content-crawler
Search Engine
By LobeHub on 2023-08-15
lobehub/chat-plugin-search-engine Query search engine to get information
web search
Realtime Weather
By LobeHub on 2023-08-12
lobehub/chat-plugin-realtime-weather Get realtime weather information
weather realtime

📊 Total plugins: 4

⌨️ Local Development

You can use GitHub Codespaces for online development:

Or clone it for local development:

$ git clone https://github.com/lobehub/lobe-chat.git
$ cd lobe-chat
$ bun install
$ bun dev

🤝 Contributing

Contributions of all types are more than welcome; if you are interested in contributing code, feel free to check out our GitHub Issues and Projects to get stuck in to show us what you’re made of.

🔗 More Products

  • 🤯 Lobe Theme: The modern theme for Stable Diffusion WebUI, exquisite interface design, highly customizable UI, and efficiency-boosting features.
  • 🌏 Lobe i18n : Lobe i18n is an automation tool for the i18n (internationalization) translation process, powered by ChatGPT. It supports features such as automatic splitting of large files, incremental updates, and customization options for the OpenAI model, API proxy, and temperature.
  • 💌 Lobe Commit: Lobe Commit is a CLI tool that leverages Langchain/ChatGPT to generate Gitmoji-based commit messages.


📝 License

Copyright © 2023 LobeHub.
This project is MIT licensed.

lukas-blecher/LaTeX-OCR
1 week, 5 days ago

pix2tex: Using a ViT to convert images of equations into LaTeX code.


pix2tex - LaTeX OCR

The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.

Using the model

To run the model you need Python 3.7+

If you don't have PyTorch installed. Follow their instructions here.

Install the package pix2tex:

pip install "pix2tex[gui]"

Model checkpoints will be downloaded automatically.

There are three ways to get a prediction from an image.

  1. You can use the command line tool by calling pix2tex. Here you can parse already existing images from the disk and images in your clipboard.

  2. Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.

    Under linux, it is possible to use the GUI with gnome-screenshot (which comes with multiple monitor support) if gnome-screenshot was installed beforehand. For Wayland, grim and slurp will be used when they are both available. Note that gnome-screenshot is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL to grim in this case (other available values are gnome-screenshot and pil).

    If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature parameter you can control this behavior (low temperature will produce the same result).

  3. You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]" and run

    python -m pix2tex.api.run
    

    to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex

    docker pull lukasblecher/pix2tex:api
    docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
    

    To also run the streamlit demo run

    docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
    

    and navigate to http://localhost:8501/

  4. Use from within Python

    from PIL import Image
    from pix2tex.cli import LatexOCR
    
    img = Image.open('path/to/image.png')
    model = LatexOCR()
    print(model(img))
    

The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.

Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

Want to use the package?

I'm trying to compile a documentation right now.

Visit here: https://pix2tex.readthedocs.io/

Training the model

Install a couple of dependencies pip install "pix2tex[train]".

  1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run
python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl

To use your own tokenizer pass it via --tokenizer (See below).

You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.

  1. Edit the data (and valdata) entry in the config file to the newly generated .pkl file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml for a template.
  2. Now for the actual training run
python -m pix2tex.train --config path_to_config_file

If you want to use your own data you might be interested in creating your own tokenizer with

python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json

Don't forget to update the path to the tokenizer in the config file and set num_tokens to your vocabulary size.

Model

The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.

Performance

BLEU score normed edit distance token accuracy
0.88 0.10 0.60

Data

We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here

Dataset Requirements

In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:

Fonts

Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

TODO

  • add more evaluation metrics
  • create a GUI
  • add beam search
  • support handwritten formulae (kinda done, see training colab notebook)
  • reduce model size (distillation)
  • find optimal hyperparameters
  • tweak model structure
  • fix data scraping and scrape more data
  • trace the model (#2)

Contribution

Contributions of any kind are welcome.

Acknowledgment

Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool

References

[1] An Image is Worth 16x16 Words

[2] Attention Is All You Need

[3] Image-to-Markup Generation with Coarse-to-Fine Attention

udlbook/udlbook
1 week, 5 days ago

Understanding Deep Learning - Simon J.D. Prince


ByteByteGoHq/system-design-101
1 week, 6 days ago

Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.


👨🏻‍💻 YouTube | 📮 Newsletter

System Design 101

Explain complex systems using visuals and simple terms.

Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.

Table of Contents

Communication protocols

Architecture styles define how different components of an application programming interface (API) interact with one another. As a result, they ensure efficiency, reliability, and ease of integration with other systems by providing a standard approach to designing and building APIs. Here are the most used styles:

  • SOAP: 

    Mature, comprehensive, XML-based

    Best for enterprise applications 

  • RESTful: 

    Popular, easy-to-implement, HTTP methods 

    Ideal for web services 

  • GraphQL: 

    Query language, request specific data 

    Reduces network overhead, faster responses 

  • gRPC: 

    Modern, high-performance, Protocol Buffers 

    Suitable for microservices architectures 

  • WebSocket: 

    Real-time, bidirectional, persistent connections 

    Perfect for low-latency data exchange 

  • Webhook: 

    Event-driven, HTTP callbacks, asynchronous 

    Notifies systems when events occur

REST API vs. GraphQL

When it comes to API design, REST and GraphQL each have their own strengths and weaknesses.

The diagram below shows a quick comparison between REST and GraphQL.

REST

  • Uses standard HTTP methods like GET, POST, PUT, DELETE for CRUD operations.
  • Works well when you need simple, uniform interfaces between separate services/applications.
  • Caching strategies are straightforward to implement.
  • The downside is it may require multiple roundtrips to assemble related data from separate endpoints.

GraphQL

  • Provides a single endpoint for clients to query for precisely the data they need.
  • Clients specify the exact fields required in nested queries, and the server returns optimized payloads containing just those fields.
  • Supports Mutations for modifying data and Subscriptions for real-time notifications.
  • Great for aggregating data from multiple sources and works well with rapidly evolving frontend requirements.
  • However, it shifts complexity to the client side and can allow abusive queries if not properly safeguarded
  • Caching strategies can be more complicated than REST.

The best choice between REST and GraphQL depends on the specific requirements of the application and development team. GraphQL is a good fit for complex or frequently changing frontend needs, while REST suits applications where simple and consistent contracts are preferred.

Neither API approach is a silver bullet. Carefully evaluating requirements and tradeoffs is important to pick the right style. Both REST and GraphQL are valid options for exposing data and powering modern applications.

How does gRPC work?

RPC (Remote Procedure Call) is called “remote” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call.

The diagram below illustrates the overall data flow for gRPC.

Step 1: A REST call is made from the client. The request body is usually in JSON format.

Steps 2 - 4: The order service (gRPC client) receives the REST call, transforms it, and makes an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the low-level transport layer.

Step 5: gRPC sends the packets over the network via HTTP2. Because of binary encoding and network optimizations, gRPC is said to be 5X faster than JSON.

Steps 6 - 8: The payment service (gRPC server) receives the packets from the network, decodes them, and invokes the server application.

Steps 9 - 11: The result is returned from the server application, and gets encoded and sent to the transport layer.

Steps 12 - 14: The order service receives the packets, decodes them, and sends the result to the client application.

What is a webhook?

The diagram below shows a comparison between polling and Webhook. 

Assume we run an eCommerce website. The clients send orders to the order service via the API gateway, which goes to the payment service for payment transactions. The payment service then talks to an external payment service provider (PSP) to complete the transactions. 

There are two ways to handle communications with the external PSP. 

1. Short polling 

After sending the payment request to the PSP, the payment service keeps asking the PSP about the payment status. After several rounds, the PSP finally returns with the status. 

Short polling has two drawbacks: 

  • Constant polling of the status requires resources from the payment service. 
  • The External service communicates directly with the payment service, creating security vulnerabilities. 

2. Webhook 

We can register a webhook with the external service. It means: call me back at a certain URL when you have updates on the request. When the PSP has completed the processing, it will invoke the HTTP request to update the payment status.

In this way, the programming paradigm is changed, and the payment service doesn’t need to waste resources to poll the payment status anymore.

What if the PSP never calls back? We can set up a housekeeping job to check payment status every hour.

Webhooks are often referred to as reverse APIs or push APIs because the server sends HTTP requests to the client. We need to pay attention to 3 things when using a webhook:

  1. We need to design a proper API for the external service to call.
  2. We need to set up proper rules in the API gateway for security reasons.
  3. We need to register the correct URL at the external service.

How to improve API performance?

The diagram below shows 5 common tricks to improve API performance.

Pagination

This is a common optimization when the size of the result is large. The results are streaming back to the client to improve the service responsiveness.

Asynchronous Logging

Synchronous logging deals with the disk for every call and can slow down the system. Asynchronous logging sends logs to a lock-free buffer first and immediately returns. The logs will be flushed to the disk periodically. This significantly reduces the I/O overhead.

Caching

We can cache frequently accessed data into a cache. The client can query the cache first instead of visiting the database directly. If there is a cache miss, the client can query from the database. Caches like Redis store data in memory, so the data access is much faster than the database.

Payload Compression

The requests and responses can be compressed using gzip etc so that the transmitted data size is much smaller. This speeds up the upload and download.

Connection Pool

When accessing resources, we often need to load data from the database. Opening the closing db connections adds significant overhead. So we should connect to the db via a pool of open connections. The connection pool is responsible for managing the connection lifecycle.

HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)

What problem does each generation of HTTP solve?

The diagram below illustrates the key features.

  • HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.

  • HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesn’t solve the HOL (head-of-line) blocking issue.

    HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.

  • HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.

    As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP “streams”: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesn’t need to be sent in order.

  • HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.

QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.

SOAP vs REST vs GraphQL vs RPC

The diagram below illustrates the API timeline and API styles comparison.

Over time, different API architectural styles are released. Each of them has its own patterns of standardizing data exchange.

You can check out the use cases of each style in the diagram.

Code First vs. API First

The diagram below shows the differences between code-first development and API-first development. Why do we want to consider API first design?

  • Microservices increase system complexity and we have separate services to serve different functions of the system. While this kind of architecture facilitates decoupling and segregation of duty, we need to handle the various communications among services.

It is better to think through the system's complexity before writing the code and carefully defining the boundaries of the services.

  • Separate functional teams need to speak the same language and the dedicated functional teams are only responsible for their own components and services. It is recommended that the organization speak the same language via API design.

We can mock requests and responses to validate the API design before writing code.

  • Improve software quality and developer productivity Since we have ironed out most of the uncertainties when the project starts, the overall development process is smoother, and the software quality is greatly improved.

Developers are happy about the process as well because they can focus on functional development instead of negotiating sudden changes.

The possibility of having surprises toward the end of the project lifecycle is reduced.

Because we have designed the API first, the tests can be designed while the code is being developed. In a way, we also have TDD (Test Driven Design) when using API first development.

HTTP status codes

The response codes for HTTP are divided into five categories:

Informational (100-199) Success (200-299) Redirection (300-399) Client Error (400-499) Server Error (500-599)

What does API gateway do?

The diagram below shows the details.

Step 1 - The client sends an HTTP request to the API gateway.

Step 2 - The API gateway parses and validates the attributes in the HTTP request.

Step 3 - The API gateway performs allow-list/deny-list checks.

Step 4 - The API gateway talks to an identity provider for authentication and authorization.

Step 5 - The rate limiting rules are applied to the request. If it is over the limit, the request is rejected.

Steps 6 and 7 - Now that the request has passed basic checks, the API gateway finds the relevant service to route to by path matching.

Step 8 - The API gateway transforms the request into the appropriate protocol and sends it to backend microservices.

Steps 9-12: The API gateway can handle errors properly, and deals with faults if the error takes a longer time to recover (circuit break). It can also leverage ELK (Elastic-Logstash-Kibana) stack for logging and monitoring. We sometimes cache data in the API gateway.

How do we design effective and safe APIs?

The diagram below shows typical API designs with a shopping cart example.

Note that API design is not just URL path design. Most of the time, we need to choose the proper resource names, identifiers, and path patterns. It is equally important to design proper HTTP header fields or to design effective rate-limiting rules within the API gateway.

TCP/IP encapsulation

How is data sent over the network? Why do we need so many layers in the OSI model?

The diagram below shows how data is encapsulated and de-encapsulated when transmitting over the network.

Step 1: When Device A sends data to Device B over the network via the HTTP protocol, it is first added an HTTP header at the application layer.

Step 2: Then a TCP or a UDP header is added to the data. It is encapsulated into TCP segments at the transport layer. The header contains the source port, destination port, and sequence number.

Step 3: The segments are then encapsulated with an IP header at the network layer. The IP header contains the source/destination IP addresses.

Step 4: The IP datagram is added a MAC header at the data link layer, with source/destination MAC addresses.

Step 5: The encapsulated frames are sent to the physical layer and sent over the network in binary bits.

Steps 6-10: When Device B receives the bits from the network, it performs the de-encapsulation process, which is a reverse processing of the encapsulation process. The headers are removed layer by layer, and eventually, Device B can read the data.

We need layers in the network model because each layer focuses on its own responsibilities. Each layer can rely on the headers for processing instructions and does not need to know the meaning of the data from the last layer.

Why is Nginx called a “reverse” proxy?

The diagram below shows the differences between a 𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐩𝐫𝐨𝐱𝐲 and a 𝐫𝐞𝐯𝐞𝐫𝐬𝐞 𝐩𝐫𝐨𝐱𝐲.

A forward proxy is a server that sits between user devices and the internet.

A forward proxy is commonly used for:

  1. Protecting clients
  2. Circumventing browsing restrictions
  3. Blocking access to certain content

A reverse proxy is a server that accepts a request from the client, forwards the request to web servers, and returns the results to the client as if the proxy server had processed the request.

A reverse proxy is good for:

  1. Protecting servers
  2. Load balancing
  3. Caching static contents
  4. Encrypting and decrypting SSL communications

What are the common load-balancing algorithms?

The diagram below shows 6 common algorithms.

  • Static Algorithms
  1. Round robin

    The client requests are sent to different service instances in sequential order. The services are usually required to be stateless.

  2. Sticky round-robin

    This is an improvement of the round-robin algorithm. If Alice’s first request goes to service A, the following requests go to service A as well.

  3. Weighted round-robin

    The admin can specify the weight for each service. The ones with a higher weight handle more requests than others.

  4. Hash

    This algorithm applies a hash function on the incoming requests’ IP or URL. The requests are routed to relevant instances based on the hash function result.

  • Dynamic Algorithms
  1. Least connections

    A new request is sent to the service instance with the least concurrent connections.

  2. Least response time

    A new request is sent to the service instance with the fastest response time.

URL, URI, URN - Do you know the differences?

The diagram below shows a comparison of URL, URI, and URN.

  • URI

URI stands for Uniform Resource Identifier. It identifies a logical or physical resource on the web. URL and URN are subtypes of URI. URL locates a resource, while URN names a resource.

A URI is composed of the following parts: scheme:[//authority]path[?query][#fragment]

  • URL

URL stands for Uniform Resource Locator, the key concept of HTTP. It is the address of a unique resource on the web. It can be used with other protocols like FTP and JDBC.

  • URN

URN stands for Uniform Resource Name. It uses the urn scheme. URNs cannot be used to locate a resource. A simple example given in the diagram is composed of a namespace and a namespace-specific string.

If you would like to learn more detail on the subject, I would recommend W3C’s clarification.

CI/CD

CI/CD Pipeline Explained in Simple Terms

Section 1 - SDLC with CI/CD

The software development life cycle (SDLC) consists of several key stages: development, testing, deployment, and maintenance. CI/CD automates and integrates these stages to enable faster and more reliable releases.

When code is pushed to a git repository, it triggers an automated build and test process. End-to-end (e2e) test cases are run to validate the code. If tests pass, the code can be automatically deployed to staging/production. If issues are found, the code is sent back to development for bug fixing. This automation provides fast feedback to developers and reduces the risk of bugs in production.

Section 2 - Difference between CI and CD

Continuous Integration (CI) automates the build, test, and merge process. It runs tests whenever code is committed to detect integration issues early. This encourages frequent code commits and rapid feedback.

Continuous Delivery (CD) automates release processes like infrastructure changes and deployment. It ensures software can be released reliably at any time through automated workflows. CD may also automate the manual testing and approval steps required before production deployment.

Section 3 - CI/CD Pipeline

A typical CI/CD pipeline has several connected stages:

  • The developer commits code changes to the source control
  • CI server detects changes and triggers the build
  • Code is compiled, and tested (unit, integration tests)
  • Test results reported to the developer
  • On success, artifacts are deployed to staging environments
  • Further testing may be done on staging before release
  • CD system deploys approved changes to production

Netflix Tech Stack (CI/CD Pipeline)

Planning: Netflix Engineering uses JIRA for planning and Confluence for documentation.

Coding: Java is the primary programming language for the backend service, while other languages are used for different use cases.

Build: Gradle is mainly used for building, and Gradle plugins are built to support various use cases.

Packaging: Package and dependencies are packed into an Amazon Machine Image (AMI) for release.

Testing: Testing emphasizes the production culture's focus on building chaos tools.

Deployment: Netflix uses its self-built Spinnaker for canary rollout deployment.

Monitoring: The monitoring metrics are centralized in Atlas, and Kayenta is used to detect anomalies.

Incident report: Incidents are dispatched according to priority, and PagerDuty is used for incident handling.

Architecture patterns

MVC, MVP, MVVM, MVVM-C, and VIPER

These architecture patterns are among the most commonly used in app development, whether on iOS or Android platforms. Developers have introduced them to overcome the limitations of earlier patterns. So, how do they differ?

  • MVC, the oldest pattern, dates back almost 50 years
  • Every pattern has a "view" (V) responsible for displaying content and receiving user input
  • Most patterns include a "model" (M) to manage business data
  • "Controller," "presenter," and "view-model" are translators that mediate between the view and the model ("entity" in the VIPER pattern)

18 Key Design Patterns Every Developer Should Know

Patterns are reusable solutions to common design problems, resulting in a smoother, more efficient development process. They serve as blueprints for building better software structures. These are some of the most popular patterns:

  • Abstract Factory: Family Creator - Makes groups of related items.
  • Builder: Lego Master - Builds objects step by step, keeping creation and appearance separate.
  • Prototype: Clone Maker - Creates copies of fully prepared examples.
  • Singleton: One and Only - A special class with just one instance.
  • Adapter: Universal Plug - Connects things with different interfaces.
  • Bridge: Function Connector - Links how an object works to what it does.
  • Composite: Tree Builder - Forms tree-like structures of simple and complex parts.
  • Decorator: Customizer - Adds features to objects without changing their core.
  • Facade: One-Stop-Shop - Represents a whole system with a single, simplified interface.
  • Flyweight: Space Saver - Shares small, reusable items efficiently.
  • Proxy: Stand-In Actor - Represents another object, controlling access or actions.
  • Chain of Responsibility: Request Relay - Passes a request through a chain of objects until handled.
  • Command: Task Wrapper - Turns a request into an object, ready for action.
  • Iterator: Collection Explorer - Accesses elements in a collection one by one.
  • Mediator: Communication Hub - Simplifies interactions between different classes.
  • Memento: Time Capsule - Captures and restores an object's state.
  • Observer: News Broadcaster - Notifies classes about changes in other objects.
  • Visitor: Skillful Guest - Adds new operations to a class without altering it.

Database

A nice cheat sheet of different databases in cloud services

Choosing the right database for your project is a complex task. Many database options, each suited to distinct use cases, can quickly lead to decision fatigue.

We hope this cheat sheet provides high-level direction to pinpoint the right service that aligns with your project's needs and avoid potential pitfalls.

Note: Google has limited documentation for their database use cases. Even though we did our best to look at what was available and arrived at the best option, some of the entries may need to be more accurate.

8 Data Structures That Power Your Databases

The answer will vary depending on your use case. Data can be indexed in memory or on disk. Similarly, data formats vary, such as numbers, strings, geographic coordinates, etc. The system might be write-heavy or read-heavy. All of these factors affect your choice of database index format.

The following are some of the most popular data structures used for indexing data:

  • Skiplist: a common in-memory index type. Used in Redis
  • Hash index: a very common implementation of the “Map” data structure (or “Collection”)
  • SSTable: immutable on-disk “Map” implementation
  • LSM tree: Skiplist + SSTable. High write throughput
  • B-tree: disk-based solution. Consistent read/write performance
  • Inverted index: used for document indexing. Used in Lucene
  • Suffix tree: for string pattern search
  • R-tree: multi-dimension search, such as finding the nearest neighbor

How is an SQL statement executed in the database?

The diagram below shows the process. Note that the architectures for different databases are different, the diagram demonstrates some common designs.

Step 1 - A SQL statement is sent to the database via a transport layer protocol (e.g.TCP).

Step 2 - The SQL statement is sent to the command parser, where it goes through syntactic and semantic analysis, and a query tree is generated afterward.

Step 3 - The query tree is sent to the optimizer. The optimizer creates an execution plan.

Step 4 - The execution plan is sent to the executor. The executor retrieves data from the execution.

Step 5 - Access methods provide the data fetching logic required for execution, retrieving data from the storage engine.

Step 6 - Access methods decide whether the SQL statement is read-only. If the query is read-only (SELECT statement), it is passed to the buffer manager for further processing. The buffer manager looks for the data in the cache or data files.

Step 7 - If the statement is an UPDATE or INSERT, it is passed to the transaction manager for further processing.

Step 8 - During a transaction, the data is in lock mode. This is guaranteed by the lock manager. It also ensures the transaction’s ACID properties.

CAP theorem

The CAP theorem is one of the most famous terms in computer science, but I bet different developers have different understandings. Let’s examine what it is and why it can be confusing.

CAP theorem states that a distributed system can't provide more than two of these three guarantees simultaneously.

Consistency: consistency means all clients see the same data at the same time no matter which node they connect to.

Availability: availability means any client that requests data gets a response even if some of the nodes are down.

Partition Tolerance: a partition indicates a communication break between two nodes. Partition tolerance means the system continues to operate despite network partitions.

The “2 of 3” formulation can be useful, but this simplification could be misleading.

  1. Picking a database is not easy. Justifying our choice purely based on the CAP theorem is not enough. For example, companies don't choose Cassandra for chat applications simply because it is an AP system. There is a list of good characteristics that make Cassandra a desirable option for storing chat messages. We need to dig deeper.

  2. “CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare”. Quoted from the paper: CAP Twelve Years Later: How the “Rules” Have Changed.

  3. The theorem is about 100% availability and consistency. A more realistic discussion would be the trade-offs between latency and consistency when there is no network partition. See PACELC theorem for more details.

Is the CAP theorem actually useful?

I think it is still useful as it opens our minds to a set of tradeoff discussions, but it is only part of the story. We need to dig deeper when picking the right database.

Types of Memory and Storage

Visualizing a SQL query

SQL statements are executed by the database system in several steps, including:

  • Parsing the SQL statement and checking its validity
  • Transforming the SQL into an internal representation, such as relational algebra
  • Optimizing the internal representation and creating an execution plan that utilizes index information
  • Executing the plan and returning the results

The execution of SQL is highly complex and involves many considerations, such as:

  • The use of indexes and caches
  • The order of table joins
  • Concurrency control
  • Transaction management

SQL language

In 1986, SQL (Structured Query Language) became a standard. Over the next 40 years, it became the dominant language for relational database management systems. Reading the latest standard (ANSI SQL 2016) can be time-consuming. How can I learn it?

There are 5 components of the SQL language:

  • DDL: data definition language, such as CREATE, ALTER, DROP
  • DQL: data query language, such as SELECT
  • DML: data manipulation language, such as INSERT, UPDATE, DELETE
  • DCL: data control language, such as GRANT, REVOKE
  • TCL: transaction control language, such as COMMIT, ROLLBACK

For a backend engineer, you may need to know most of it. As a data analyst, you may need to have a good understanding of DQL. Select the topics that are most relevant to you.

Cache

Data is cached everywhere

This diagram illustrates where we cache data in a typical architecture.

There are multiple layers along the flow.

  1. Client apps: HTTP responses can be cached by the browser. We request data over HTTP for the first time, and it is returned with an expiry policy in the HTTP header; we request data again, and the client app tries to retrieve the data from the browser cache first.
  2. CDN: CDN caches static web resources. The clients can retrieve data from a CDN node nearby.
  3. Load Balancer: The load Balancer can cache resources as well.
  4. Messaging infra: Message brokers store messages on disk first, and then consumers retrieve them at their own pace. Depending on the retention policy, the data is cached in Kafka clusters for a period of time.
  5. Services: There are multiple layers of cache in a service. If the data is not cached in the CPU cache, the service will try to retrieve the data from memory. Sometimes the service has a second-level cache to store data on disk.
  6. Distributed Cache: Distributed cache like Redis holds key-value pairs for multiple services in memory. It provides much better read/write performance than the database.
  7. Full-text Search: we sometimes need to use full-text searches like Elastic Search for document search or log search. A copy of data is indexed in the search engine as well.
  8. Database: Even in the database, we have different levels of caches:
  • WAL(Write-ahead Log): data is written to WAL first before building the B tree index
  • Bufferpool: A memory area allocated to cache query results
  • Materialized View: Pre-compute query results and store them in the database tables for better query performance
  • Transaction log: record all the transactions and database updates
  • Replication Log: used to record the replication state in a database cluster

Why is Redis so fast?

There are 3 main reasons as shown in the diagram below.

  1. Redis is a RAM-based data store. RAM access is at least 1000 times faster than random disk access.
  2. Redis leverages IO multiplexing and single-threaded execution loop for execution efficiency.
  3. Redis leverages several efficient lower-level data structures.

Question: Another popular in-memory store is Memcached. Do you know the differences between Redis and Memcached?

You might have noticed the style of this diagram is different from my previous posts. Please let me know which one you prefer.

How can Redis be used?

There is more to Redis than just caching.

Redis can be used in a variety of scenarios as shown in the diagram.

  • Session

    We can use Redis to share user session data among different services.

  • Cache

    We can use Redis to cache objects or pages, especially for hotspot data.

  • Distributed lock

    We can use a Redis string to acquire locks among distributed services.

  • Counter

    We can count how many likes or how many reads for articles.

  • Rate limiter

    We can apply a rate limiter for certain user IPs.

  • Global ID generator

    We can use Redis Int for global ID.

  • Shopping cart

    We can use Redis Hash to represent key-value pairs in a shopping cart.

  • Calculate user retention

    We can use Bitmap to represent the user login daily and calculate user retention.

  • Message queue

    We can use List for a message queue.

  • Ranking

    We can use ZSet to sort the articles.

Top caching strategies

Designing large-scale systems usually requires careful consideration of caching. Below are five caching strategies that are frequently utilized.

Microservice architecture

What does a typical microservice architecture look like?

The diagram below shows a typical microservice architecture.

  • Load Balancer: This distributes incoming traffic across multiple backend services.
  • CDN (Content Delivery Network): CDN is a group of geographically distributed servers that hold static content for faster delivery. The clients look for content in CDN first, then progress to backend services.
  • API Gateway: This handles incoming requests and routes them to the relevant services. It talks to the identity provider and service discovery.
  • Identity Provider: This handles authentication and authorization for users.
  • Service Registry & Discovery: Microservice registration and discovery happen in this component, and the API gateway looks for relevant services in this component to talk to.
  • Management: This component is responsible for monitoring the services.
  • Microservices: Microservices are designed and deployed in different domains. Each domain has its own database. The API gateway talks to the microservices via REST API or other protocols, and the microservices within the same domain talk to each other using RPC (Remote Procedure Call).

Benefits of microservices:

  • They can be quickly designed, deployed, and horizontally scaled.
  • Each domain can be independently maintained by a dedicated team.
  • Business requirements can be customized in each domain and better supported, as a result.

Microservice Best Practices

A picture is worth a thousand words: 9 best practices for developing microservices.

When we develop microservices, we need to follow the following best practices:

  1. Use separate data storage for each microservice
  2. Keep code at a similar level of maturity
  3. Separate build for each microservice
  4. Assign each microservice with a single responsibility
  5. Deploy into containers
  6. Design stateless services
  7. Adopt domain-driven design
  8. Design micro frontend
  9. Orchestrating microservices

What tech stack is commonly used for microservices?

Below you will find a diagram showing the microservice tech stack, both for the development phase and for production.

▶️ 𝐏𝐫𝐞-𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧

  • Define API - This establishes a contract between frontend and backend. We can use Postman or OpenAPI for this.
  • Development - Node.js or react is popular for frontend development, and java/python/go for backend development. Also, we need to change the configurations in the API gateway according to API definitions.
  • Continuous Integration - JUnit and Jenkins for automated testing. The code is packaged into a Docker image and deployed as microservices.

▶️ 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧

  • NGinx is a common choice for load balancers. Cloudflare provides CDN (Content Delivery Network).
  • API Gateway - We can use spring boot for the gateway, and use Eureka/Zookeeper for service discovery.
  • The microservices are deployed on clouds. We have options among AWS, Microsoft Azure, or Google GCP. Cache and Full-text Search - Redis is a common choice for caching key-value pairs. Elasticsearch is used for full-text search.
  • Communications - For services to talk to each other, we can use messaging infra Kafka or RPC.
  • Persistence - We can use MySQL or PostgreSQL for a relational database, and Amazon S3 for object store. We can also use Cassandra for the wide-column store if necessary.
  • Management & Monitoring - To manage so many microservices, the common Ops tools include Prometheus, Elastic Stack, and Kubernetes.

Why is Kafka fast

There are many design decisions that contributed to Kafka’s performance. In this post, we’ll focus on two. We think these two carried the most weight.

  1. The first one is Kafka’s reliance on Sequential I/O.
  2. The second design choice that gives Kafka its performance advantage is its focus on efficiency: zero copy principle.

The diagram illustrates how the data is transmitted between producer and consumer, and what zero-copy means.

  • Step 1.1 - 1.3: Producer writes data to the disk
  • Step 2: Consumer reads data without zero-copy

2.1 The data is loaded from disk to OS cache

2.2 The data is copied from OS cache to Kafka application

2.3 Kafka application copies the data into the socket buffer

2.4 The data is copied from socket buffer to network card

2.5 The network card sends data out to the consumer

  • Step 3: Consumer reads data with zero-copy

3.1: The data is loaded from disk to OS cache 3.2 OS cache directly copies the data to the network card via sendfile() command 3.3 The network card sends data out to the consumer

Zero copy is a shortcut to save the multiple data copies between application context and kernel context.

Payment systems

How to learn payment systems?

Why is the credit card called “the most profitable product in banks”? How does VISA/Mastercard make money?

The diagram below shows the economics of the credit card payment flow.

1.  The cardholder pays a merchant $100 to buy a product.

2. The merchant benefits from the use of the credit card with higher sales volume and needs to compensate the issuer and the card network for providing the payment service. The acquiring bank sets a fee with the merchant, called the “merchant discount fee.”

3 - 4. The acquiring bank keeps $0.25 as the acquiring markup, and $1.75 is paid to the issuing bank as the interchange fee. The merchant discount fee should cover the interchange fee.

The interchange fee is set by the card network because it is less efficient for each issuing bank to negotiate fees with each merchant.

5.  The card network sets up the network assessments and fees with each bank, which pays the card network for its services every month. For example, VISA charges a 0.11% assessment, plus a $0.0195 usage fee, for every swipe.

6.  The cardholder pays the issuing bank for its services.

Why should the issuing bank be compensated?

  • The issuer pays the merchant even if the cardholder fails to pay the issuer.
  • The issuer pays the merchant before the cardholder pays the issuer.
  • The issuer has other operating costs, including managing customer accounts, providing statements, fraud detection, risk management, clearing & settlement, etc.

How does VISA work when we swipe a credit card at a merchant’s shop?

VISA, Mastercard, and American Express act as card networks for the clearing and settling of funds. The card acquiring bank and the card issuing bank can be – and often are – different. If banks were to settle transactions one by one without an intermediary, each bank would have to settle the transactions with all the other banks. This is quite inefficient.

The diagram below shows VISA’s role in the credit card payment process. There are two flows involved. Authorization flow happens when the customer swipes the credit card. Capture and settlement flow happens when the merchant wants to get the money at the end of the day.

  • Authorization Flow

Step 0: The card issuing bank issues credit cards to its customers.

Step 1: The cardholder wants to buy a product and swipes the credit card at the Point of Sale (POS) terminal in the merchant’s shop.

Step 2: The POS terminal sends the transaction to the acquiring bank, which has provided the POS terminal.

Steps 3 and 4: The acquiring bank sends the transaction to the card network, also called the card scheme. The card network sends the transaction to the issuing bank for approval.

Steps 4.1, 4.2 and 4.3: The issuing bank freezes the money if the transaction is approved. The approval or rejection is sent back to the acquirer, as well as the POS terminal.

  • Capture and Settlement Flow

Steps 1 and 2: The merchant wants to collect the money at the end of the day, so they hit ”capture” on the POS terminal. The transactions are sent to the acquirer in batch. The acquirer sends the batch file with transactions to the card network.

Step 3: The card network performs clearing for the transactions collected from different acquirers, and sends the clearing files to different issuing banks.

Step 4: The issuing banks confirm the correctness of the clearing files, and transfer money to the relevant acquiring banks.

Step 5: The acquiring bank then transfers money to the merchant’s bank.

Step 4: The card network clears up the transactions from different acquiring banks. Clearing is a process in which mutual offset transactions are netted, so the number of total transactions is reduced.

In the process, the card network takes on the burden of talking to each bank and receives service fees in return.

Payment Systems Around The World Series (Part 1): Unified Payments Interface (UPI) in India

What’s UPI? UPI is an instant real-time payment system developed by the National Payments Corporation of India.

It accounts for 60% of digital retail transactions in India today.

UPI = payment markup language + standard for interoperable payments

DevOps

DevOps vs. SRE vs. Platform Engineering. What is the difference?

The concepts of DevOps, SRE, and Platform Engineering have emerged at different times and have been developed by various individuals and organizations.

DevOps as a concept was introduced in 2009 by Patrick Debois and Andrew Shafer at the Agile conference. They sought to bridge the gap between software development and operations by promoting a collaborative culture and shared responsibility for the entire software development lifecycle.

SRE, or Site Reliability Engineering, was pioneered by Google in the early 2000s to address operational challenges in managing large-scale, complex systems. Google developed SRE practices and tools, such as the Borg cluster management system and the Monarch monitoring system, to improve the reliability and efficiency of their services.

Platform Engineering is a more recent concept, building on the foundation of SRE engineering. The precise origins of Platform Engineering are less clear, but it is generally understood to be an extension of the DevOps and SRE practices, with a focus on delivering a comprehensive platform for product development that supports the entire business perspective.

It's worth noting that while these concepts emerged at different times. They are all related to the broader trend of improving collaboration, automation, and efficiency in software development and operations.

What is k8s (Kubernetes)?

K8s is a container orchestration system. It is used for container deployment and management. Its design is greatly impacted by Google’s internal system Borg.

A k8s cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.

The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers, and a cluster usually runs multiple nodes, providing fault tolerance and high availability.

  • Control Plane Components
  1. API Server

    The API server talks to all the components in the k8s cluster. All the operations on pods are executed by talking to the API server.

  2. Scheduler

    The scheduler watches pod workloads and assigns loads on newly created pods.

  3. Controller Manager

    The controller manager runs the controllers, including Node Controller, Job Controller, EndpointSlice Controller, and ServiceAccount Controller.

  4. Etcd

    etcd is a key-value store used as Kubernetes' backing store for all cluster data.

  • Nodes
  1. Pods

    A pod is a group of containers and is the smallest unit that k8s administers. Pods have a single IP address applied to every container within the pod.

  2. Kubelet

    An agent that runs on each node in the cluster. It ensures containers are running in a Pod.

  3. Kube Proxy

    Kube-proxy is a network proxy that runs on each node in your cluster. It routes traffic coming into a node from the service. It forwards requests for work to the correct containers.

Docker vs. Kubernetes. Which one should we use?

What is Docker ?

Docker is an open-source platform that allows you to package, distribute, and run applications in isolated containers. It focuses on containerization, providing lightweight environments that encapsulate applications and their dependencies.

What is Kubernetes ?

Kubernetes, often referred to as K8s, is an open-source container orchestration platform. It provides a framework for automating the deployment, scaling, and management of containerized applications across a cluster of nodes.

How are both different from each other ?

Docker: Docker operates at the individual container level on a single operating system host.

You must manually manage each host and setting up networks, security policies, and storage for multiple related containers can be complex.

Kubernetes: Kubernetes operates at the cluster level. It manages multiple containerized applications across multiple hosts, providing automation for tasks like load balancing, scaling, and ensuring the desired state of applications.

In short, Docker focuses on containerization and running containers on individual hosts, while Kubernetes specializes in managing and orchestrating containers at scale across a cluster of hosts.

How does Docker work?

The diagram below shows the architecture of Docker and how it works when we run “docker build”, “docker pull” and “docker run”.

There are 3 components in Docker architecture:

  • Docker client

    The docker client talks to the Docker daemon.

  • Docker host

    The Docker daemon listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes.

  • Docker registry

    A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use.

Let’s take the “docker run” command as an example.

  1. Docker pulls the image from the registry.
  2. Docker creates a new container.
  3. Docker allocates a read-write filesystem to the container.
  4. Docker creates a network interface to connect the container to the default network.
  5. Docker starts the container.

GIT

How Git Commands work

To begin with, it's essential to identify where our code is stored. The common assumption is that there are only two locations - one on a remote server like Github and the other on our local machine. However, this isn't entirely accurate. Git maintains three local storages on our machine, which means that our code can be found in four places:

  • Working directory: where we edit files
  • Staging area: a temporary location where files are kept for the next commit
  • Local repository: contains the code that has been committed
  • Remote repository: the remote server that stores the code

Most Git commands primarily move files between these four locations.

How does Git Work?

The diagram below shows the Git workflow.

Git is a distributed version control system.

Every developer maintains a local copy of the main repository and edits and commits to the local copy.

The commit is very fast because the operation doesn’t interact with the remote repository.

If the remote repository crashes, the files can be recovered from the local repositories.

Git merge vs. Git rebase

What are the differences?

When we merge changes from one Git branch to another, we can use ‘git merge’ or ‘git rebase’. The diagram below shows how the two commands work.

Git merge

This creates a new commit G’ in the main branch. G’ ties the histories of both main and feature branches.

Git merge is non-destructive. Neither the main nor the feature branch is changed.

Git rebase

Git rebase moves the feature branch histories to the head of the main branch. It creates new commits E’, F’, and G’ for each commit in the feature branch.

The benefit of rebase is that it has a linear commit history.

Rebase can be dangerous if “the golden rule of git rebase” is not followed.

The Golden Rule of Git Rebase

Never use it on public branches!

Cloud Services

A nice cheat sheet of different cloud services (2023 edition)

What is cloud native?

Below is a diagram showing the evolution of architecture and processes since the 1980s.

Organizations can build and run scalable applications on public, private, and hybrid clouds using cloud native technologies.

This means the applications are designed to leverage cloud features, so they are resilient to load and easy to scale.

Cloud native includes 4 aspects:

  1. Development process

    This has progressed from waterfall to agile to DevOps.

  2. Application Architecture

    The architecture has gone from monolithic to microservices. Each service is designed to be small, adaptive to the limited resources in cloud containers.

  3. Deployment & packaging

    The applications used to be deployed on physical servers. Then around 2000, the applications that were not sensitive to latency were usually deployed on virtual servers. With cloud native applications, they are packaged into docker images and deployed in containers.

  4. Application infrastructure

    The applications are massively deployed on cloud infrastructure instead of self-hosted servers.

Developer productivity tools

Visualize JSON files

Nested JSON files are hard to read.

JsonCrack generates graph diagrams from JSON files and makes them easy to read.

Additionally, the generated diagrams can be downloaded as images.

Automatically turn code into architecture diagrams

What does it do?

  • Draw the cloud system architecture in Python code.
  • Diagrams can also be rendered directly inside the Jupyter Notebooks.
  • No design tools are needed.
  • Supports the following providers: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud, etc.

Github repo

Linux

Linux file system explained

The Linux file system used to resemble an unorganized town where individuals constructed their houses wherever they pleased. However, in 1994, the Filesystem Hierarchy Standard (FHS) was introduced to bring order to the Linux file system.

By implementing a standard like the FHS, software can ensure a consistent layout across various Linux distributions. Nonetheless, not all Linux distributions strictly adhere to this standard. They often incorporate their own unique elements or cater to specific requirements. To become proficient in this standard, you can begin by exploring. Utilize commands such as "cd" for navigation and "ls" for listing directory contents. Imagine the file system as a tree, starting from the root (/). With time, it will become second nature to you, transforming you into a skilled Linux administrator.

18 Most-used Linux Commands You Should Know

Linux commands are instructions for interacting with the operating system. They help manage files, directories, system processes, and many other aspects of the system. You need to become familiar with these commands in order to navigate and maintain Linux-based systems efficiently and effectively.

This diagram below shows popular Linux commands:

  • ls - List files and directories
  • cd - Change the current directory
  • mkdir - Create a new directory
  • rm - Remove files or directories
  • cp - Copy files or directories
  • mv - Move or rename files or directories
  • chmod - Change file or directory permissions
  • grep - Search for a pattern in files
  • find - Search for files and directories
  • tar - manipulate tarball archive files
  • vi - Edit files using text editors
  • cat - display the content of files
  • top - Display processes and resource usage
  • ps - Display processes information
  • kill - Terminate a process by sending a signal
  • du - Estimate file space usage
  • ifconfig - Configure network interfaces
  • ping - Test network connectivity between hosts

Security

How does HTTPS work?

Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP.) HTTPS transmits encrypted data using Transport Layer Security (TLS.) If the data is hijacked online, all the hijacker gets is binary code.

How is the data encrypted and decrypted?

Step 1 - The client (browser) and the server establish a TCP connection.

Step 2 - The client sends a “client hello” to the server. The message contains a set of necessary encryption algorithms (cipher suites) and the latest TLS version it can support. The server responds with a “server hello” so the browser knows whether it can support the algorithms and TLS version.

The server then sends the SSL certificate to the client. The certificate contains the public key, host name, expiry dates, etc. The client validates the certificate.

Step 3 - After validating the SSL certificate, the client generates a session key and encrypts it using the public key. The server receives the encrypted session key and decrypts it with the private key.

Step 4 - Now that both the client and the server hold the same session key (symmetric encryption), the encrypted data is transmitted in a secure bi-directional channel.

Why does HTTPS switch to symmetric encryption during data transmission? There are two main reasons:

  1. Security: The asymmetric encryption goes only one way. This means that if the server tries to send the encrypted data back to the client, anyone can decrypt the data using the public key.

  2. Server resources: The asymmetric encryption adds quite a lot of mathematical overhead. It is not suitable for data transmissions in long sessions.

Oauth 2.0 Explained With Simple Terms.

OAuth 2.0 is a powerful and secure framework that allows different applications to securely interact with each other on behalf of users without sharing sensitive credentials.

The entities involved in OAuth are the User, the Server, and the Identity Provider (IDP).

What Can an OAuth Token Do?

When you use OAuth, you get an OAuth token that represents your identity and permissions. This token can do a few important things:

Single Sign-On (SSO): With an OAuth token, you can log into multiple services or apps using just one login, making life easier and safer.

Authorization Across Systems: The OAuth token allows you to share your authorization or access rights across various systems, so you don't have to log in separately everywhere.

Accessing User Profile: Apps with an OAuth token can access certain parts of your user profile that you allow, but they won't see everything.

Remember, OAuth 2.0 is all about keeping you and your data safe while making your online experiences seamless and hassle-free across different applications and services.

Top 4 Forms of Authentication Mechanisms

  1. SSH Keys:

    Cryptographic keys are used to access remote systems and servers securely

  2. OAuth Tokens:

    Tokens that provide limited access to user data on third-party applications

  3. SSL Certificates:

    Digital certificates ensure secure and encrypted communication between servers and clients

  4. Credentials:

    User authentication information is used to verify and grant access to various systems and services

Session, cookie, JWT, token, SSO, and OAuth 2.0 - what are they?

These terms are all related to user identity management. When you log into a website, you declare who you are (identification). Your identity is verified (authentication), and you are granted the necessary permissions (authorization). Many solutions have been proposed in the past, and the list keeps growing.

From simple to complex, here is my understanding of user identity management:

  • WWW-Authenticate is the most basic method. You are asked for the username and password by the browser. As a result of the inability to control the login life cycle, it is seldom used today.

  • A finer control over the login life cycle is session-cookie. The server maintains session storage, and the browser keeps the ID of the session. A cookie usually only works with browsers and is not mobile app friendly.

  • To address the compatibility issue, the token can be used. The client sends the token to the server, and the server validates the token. The downside is that the token needs to be encrypted and decrypted, which may be time-consuming.

  • JWT is a standard way of representing tokens. This information can be verified and trusted because it is digitally signed. Since JWT contains the signature, there is no need to save session information on the server side.

  • By using SSO (single sign-on), you can sign on only once and log in to multiple websites. It uses CAS (central authentication service) to maintain cross-site information.

  • By using OAuth 2.0, you can authorize one website to access your information on another website.

How to store passwords safely in the database and how to validate a password?

Things NOT to do

  • Storing passwords in plain text is not a good idea because anyone with internal access can see them.

  • Storing password hashes directly is not sufficient because it is pruned to precomputation attacks, such as rainbow tables.

  • To mitigate precomputation attacks, we salt the passwords.

What is salt?

According to OWASP guidelines, “a salt is a unique, randomly generated string that is added to each password as part of the hashing process”.

How to store a password and salt?

  1. the hash result is unique to each password.
  2. The password can be stored in the database using the following format: hash(password + salt).

How to validate a password?

To validate a password, it can go through the following process:

  1. A client enters the password.
  2. The system fetches the corresponding salt from the database.
  3. The system appends the salt to the password and hashes it. Let’s call the hashed value H1.
  4. The system compares H1 and H2, where H2 is the hash stored in the database. If they are the same, the password is valid.

Explaining JSON Web Token (JWT) to a 10 year old Kid

Imagine you have a special box called a JWT. Inside this box, there are three parts: a header, a payload, and a signature.

The header is like the label on the outside of the box. It tells us what type of box it is and how it's secured. It's usually written in a format called JSON, which is just a way to organize information using curly braces { } and colons : .

The payload is like the actual message or information you want to send. It could be your name, age, or any other data you want to share. It's also written in JSON format, so it's easy to understand and work with. Now, the signature is what makes the JWT secure. It's like a special seal that only the sender knows how to create. The signature is created using a secret code, kind of like a password. This signature ensures that nobody can tamper with the contents of the JWT without the sender knowing about it.

When you want to send the JWT to a server, you put the header, payload, and signature inside the box. Then you send it over to the server. The server can easily read the header and payload to understand who you are and what you want to do.

How does Google Authenticator (or other types of 2-factor authenticators) work?

Google Authenticator is commonly used for logging into our accounts when 2-factor authentication is enabled. How does it guarantee security?

Google Authenticator is a software-based authenticator that implements a two-step verification service. The diagram below provides detail.

There are two stages involved:

  • Stage 1 - The user enables Google two-step verification.
  • Stage 2 - The user uses the authenticator for logging in, etc.

Let’s look at these stages.

Stage 1

Steps 1 and 2: Bob opens the web page to enable two-step verification. The front end requests a secret key. The authentication service generates the secret key for Bob and stores it in the database.

Step 3: The authentication service returns a URI to the front end. The URI is composed of a key issuer, username, and secret key. The URI is displayed in the form of a QR code on the web page.

Step 4: Bob then uses Google Authenticator to scan the generated QR code. The secret key is stored in the authenticator.

Stage 2 Steps 1 and 2: Bob wants to log into a website with Google two-step verification. For this, he needs the password. Every 30 seconds, Google Authenticator generates a 6-digit password using TOTP (Time-based One Time Password) algorithm. Bob uses the password to enter the website.

Steps 3 and 4: The frontend sends the password Bob enters to the backend for authentication. The authentication service reads the secret key from the database and generates a 6-digit password using the same TOTP algorithm as the client.

Step 5: The authentication service compares the two passwords generated by the client and the server, and returns the comparison result to the frontend. Bob can proceed with the login process only if the two passwords match.

Is this authentication mechanism safe?

  • Can the secret key be obtained by others?

    We need to make sure the secret key is transmitted using HTTPS. The authenticator client and the database store the secret key, and we need to make sure the secret keys are encrypted.

  • Can the 6-digit password be guessed by hackers?

    No. The password has 6 digits, so the generated password has 1 million potential combinations. Plus, the password changes every 30 seconds. If hackers want to guess the password in 30 seconds, they need to enter 30,000 combinations per second.

Real World Case Studies

Netflix's Tech Stack

This post is based on research from many Netflix engineering blogs and open-source projects. If you come across any inaccuracies, please feel free to inform us.

Mobile and web: Netflix has adopted Swift and Kotlin to build native mobile apps. For its web application, it uses React.

Frontend/server communication: Netflix uses GraphQL.

Backend services: Netflix relies on ZUUL, Eureka, the Spring Boot framework, and other technologies.

Databases: Netflix utilizes EV cache, Cassandra, CockroachDB, and other databases.

Messaging/streaming: Netflix employs Apache Kafka and Fink for messaging and streaming purposes.

Video storage: Netflix uses S3 and Open Connect for video storage.

Data processing: Netflix utilizes Flink and Spark for data processing, which is then visualized using Tableau. Redshift is used for processing structured data warehouse information.

CI/CD: Netflix employs various tools such as JIRA, Confluence, PagerDuty, Jenkins, Gradle, Chaos Monkey, Spinnaker, Atlas, and more for CI/CD processes.

Twitter Architecture 2022

Yes, this is the real Twitter architecture. It is posted by Elon Musk and redrawn by us for better readability.

Evolution of Airbnb’s microservice architecture over the past 15 years

Airbnb’s microservice architecture went through 3 main stages.

Monolith (2008 - 2017)

Airbnb began as a simple marketplace for hosts and guests. This is built in a Ruby on Rails application - the monolith.

What’s the challenge?

  • Confusing team ownership + unowned code
  • Slow deployment

Microservices (2017 - 2020)

Microservice aims to solve those challenges. In the microservice architecture, key services include:

  • Data fetching service
  • Business logic data service
  • Write workflow service
  • UI aggregation service
  • Each service had one owning team

What’s the challenge?

Hundreds of services and dependencies were difficult for humans to manage.

Micro + macroservices (2020 - present)

This is what Airbnb is working on now. The micro and macroservice hybrid model focuses on the unification of APIs.

Monorepo vs. Microrepo.

Which is the best? Why do different companies choose different options?

Monorepo isn't new; Linux and Windows were both created using Monorepo. To improve scalability and build speed, Google developed its internal dedicated toolchain to scale it faster and strict coding quality standards to keep it consistent.

Amazon and Netflix are major ambassadors of the Microservice philosophy. This approach naturally separates the service code into separate repositories. It scales faster but can lead to governance pain points later on.

Within Monorepo, each service is a folder, and every folder has a BUILD config and OWNERS permission control. Every service member is responsible for their own folder.

On the other hand, in Microrepo, each service is responsible for its repository, with the build config and permissions typically set for the entire repository.

In Monorepo, dependencies are shared across the entire codebase regardless of your business, so when there's a version upgrade, every codebase upgrades their version.

In Microrepo, dependencies are controlled within each repository. Businesses choose when to upgrade their versions based on their own schedules.

Monorepo has a standard for check-ins. Google's code review process is famously known for setting a high bar, ensuring a coherent quality standard for Monorepo, regardless of the business.

Microrepo can either set its own standard or adopt a shared standard by incorporating the best practices. It can scale faster for business, but the code quality might be a bit different. Google engineers built Bazel, and Meta built Buck. There are other open-source tools available, including Nix, Lerna, and others.

Over the years, Microrepo has had more supported tools, including Maven and Gradle for Java, NPM for NodeJS, and CMake for C/C++, among others.

How will you design the Stack Overflow website?

If your answer is on-premise servers and monolith (on the bottom of the following image), you would likely fail the interview, but that's how it is built in reality!

What people think it should look like

The interviewer is probably expecting something like the top portion of the picture.

  • Microservice is used to decompose the system into small components.
  • Each service has its own database. Use cache heavily.
  • The service is sharded.
  • The services talk to each other asynchronously through message queues.
  • The service is implemented using Event Sourcing with CQRS.
  • Showing off knowledge in distributed systems such as eventual consistency, CAP theorem, etc.

What it actually is

Stack Overflow serves all the traffic with only 9 on-premise web servers, and it’s on monolith! It has its own servers and does not run on the cloud.

This is contrary to all our popular beliefs these days.

Why did Amazon Prime Video monitoring move from serverless to monolithic? How can it save 90% cost?

The diagram below shows the architecture comparison before and after the migration.

What is Amazon Prime Video Monitoring Service?

Prime Video service needs to monitor the quality of thousands of live streams. The monitoring tool automatically analyzes the streams in real time and identifies quality issues like block corruption, video freeze, and sync problems. This is an important process for customer satisfaction.

There are 3 steps: media converter, defect detector, and real-time notification.

  • What is the problem with the old architecture?

    The old architecture was based on Amazon Lambda, which was good for building services quickly. However, it was not cost-effective when running the architecture at a high scale. The two most expensive operations are:

  1. The orchestration workflow - AWS step functions charge users by state transitions and the orchestration performs multiple state transitions every second.

  2. Data passing between distributed components - the intermediate data is stored in Amazon S3 so that the next stage can download. The download can be costly when the volume is high.

  • Monolithic architecture saves 90% cost

    A monolithic architecture is designed to address the cost issues. There are still 3 components, but the media converter and defect detector are deployed in the same process, saving the cost of passing data over the network. Surprisingly, this approach to deployment architecture change led to 90% cost savings!

This is an interesting and unique case study because microservices have become a go-to and fashionable choice in the tech industry. It's good to see that we are having more discussions about evolving the architecture and having more honest discussions about its pros and cons. Decomposing components into distributed microservices comes with a cost.

  • What did Amazon leaders say about this?

    Amazon CTO Werner Vogels: “Building evolvable software systems is a strategy, not a religion. And revisiting your architecture with an open mind is a must.”

Ex Amazon VP Sustainability Adrian Cockcroft: “The Prime Video team had followed a path I call Serverless First…I don’t advocate Serverless Only”.

How does Disney Hotstar capture 5 Billion Emojis during a tournament?

  1. Clients send emojis through standard HTTP requests. You can think of Golang Service as a typical Web Server. Golang is chosen because it supports concurrency well. Threads in Golang are lightweight.

  2. Since the write volume is very high, Kafka (message queue) is used as a buffer.

  3. Emoji data are aggregated by a streaming processing service called Spark. It aggregates data every 2 seconds, which is configurable. There is a trade-off to be made based on the interval. A shorter interval means emojis are delivered to other clients faster but it also means more computing resources are needed.

  4. Aggregated data is written to another Kafka.

  5. The PubSub consumers pull aggregated emoji data from Kafka.

  6. Emojis are delivered to other clients in real-time through the PubSub infrastructure. The PubSub infrastructure is interesting. Hotstar considered the following protocols: Socketio, NATS, MQTT, and gRPC, and settled with MQTT.

A similar design is adopted by LinkedIn which streams a million likes/sec.

How Discord Stores Trillions Of Messages

The diagram below shows the evolution of message storage at Discord:

MongoDB ➡️ Cassandra ➡️ ScyllaDB

In 2015, the first version of Discord was built on top of a single MongoDB replica. Around Nov 2015, MongoDB stored 100 million messages and the RAM couldn’t hold the data and index any longer. The latency became unpredictable. Message storage needs to be moved to another database. Cassandra was chosen.

In 2017, Discord had 12 Cassandra nodes and stored billions of messages.

At the beginning of 2022, it had 177 nodes with trillions of messages. At this point, latency was unpredictable, and maintenance operations became too expensive to run.

There are several reasons for the issue:

  • Cassandra uses the LSM tree for the internal data structure. The reads are more expensive than the writes. There can be many concurrent reads on a server with hundreds of users, resulting in hotspots.
  • Maintaining clusters, such as compacting SSTables, impacts performance.
  • Garbage collection pauses would cause significant latency spikes

ScyllaDB is Cassandra compatible database written in C++. Discord redesigned its architecture to have a monolithic API, a data service written in Rust, and ScyllaDB-based storage.

The p99 read latency in ScyllaDB is 15ms compared to 40-125ms in Cassandra. The p99 write latency is 5ms compared to 5-70ms in Cassandra.

How do video live streamings work on YouTube, TikTok live, or Twitch?

Live streaming differs from regular streaming because the video content is sent via the internet in real-time, usually with a latency of just a few seconds.

The diagram below explains what happens behind the scenes to make this possible.

Step 1: The raw video data is captured by a microphone and camera. The data is sent to the server side.

Step 2: The video data is compressed and encoded. For example, the compressing algorithm separates the background and other video elements. After compression, the video is encoded to standards such as H.264. The size of the video data is much smaller after this step.

Step 3: The encoded data is divided into smaller segments, usually seconds in length, so it takes much less time to download or stream.

Step 4: The segmented data is sent to the streaming server. The streaming server needs to support different devices and network conditions. This is called ‘Adaptive Bitrate Streaming.’ This means we need to produce multiple files at different bitrates in steps 2 and 3.

Step 5: The live streaming data is pushed to edge servers supported by CDN (Content Delivery Network.) Millions of viewers can watch the video from an edge server nearby. CDN significantly lowers data transmission latency.

Step 6: The viewers’ devices decode and decompress the video data and play the video in a video player.

Steps 7 and 8: If the video needs to be stored for replay, the encoded data is sent to a storage server, and viewers can request a replay from it later.

Standard protocols for live streaming include:

  • RTMP (Real-Time Messaging Protocol): This was originally developed by Macromedia to transmit data between a Flash player and a server. Now it is used for streaming video data over the internet. Note that video conferencing applications like Skype use RTC (Real-Time Communication) protocol for lower latency.
  • HLS (HTTP Live Streaming): It requires the H.264 or H.265 encoding. Apple devices accept only HLS format.
  • DASH (Dynamic Adaptive Streaming over HTTP): DASH does not support Apple devices.
  • Both HLS and DASH support adaptive bitrate streaming.

License

This work is licensed under CC BY-NC-ND 4.0

openai/whisper
1 week, 6 days ago

Robust Speech Recognition via Large-Scale Weak Supervision


Whisper

[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup

We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git 

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

You may need rust installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

pip install setuptools-rust

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

Command-line usage

The following command will transcribe speech in audio files, using the medium model:

whisper audio.flac audio.mp3 audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese

Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate

Run the following to view all available options:

whisper --help

See tokenizer.py for the list of all available languages.

Python usage

Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.

import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

More examples

Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.

License

Whisper's code and model weights are released under the MIT License. See LICENSE for further details.

donnemartin/system-design-primer
1 week, 6 days ago

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.


English日本語简体中文繁體中文 | العَرَبِيَّة‎বাংলাPortuguês do BrasilDeutschελληνικάעבריתItaliano한국어فارسیPolskiрусский языкEspañolภาษาไทยTürkçetiếng ViệtFrançais | Add Translation

Help translate this guide!

The System Design Primer


Motivation

Learn how to design large-scale systems.

Prep for the system design interview.

Learn how to design large-scale systems

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

This repo is an organized collection of resources to help you learn how to build systems at scale.

Learn from the open source community

This is a continually updated, open source project.

Contributions are welcome!

Prep for the system design interview

In addition to coding interviews, system design is a required component of the technical interview process at many tech companies.

Practice common system design interview questions and compare your results with sample solutions: discussions, code, and diagrams.

Additional topics for interview prep:

Anki flashcards


The provided Anki flashcard decks use spaced repetition to help you retain key system design concepts.

Great for use while on-the-go.

Coding Resource: Interactive Coding Challenges

Looking for resources to help you prep for the Coding Interview?


Check out the sister repo Interactive Coding Challenges, which contains an additional Anki deck:

Contributing

Learn from the community.

Feel free to submit pull requests to help:

  • Fix errors
  • Improve sections
  • Add new sections
  • Translate

Content that needs some polishing is placed under development.

Review the Contributing Guidelines.

Index of system design topics

Summaries of various system design topics, including pros and cons. Everything is a trade-off.

Each section contains links to more in-depth resources.


Study guide

Suggested topics to review based on your interview timeline (short, medium, long).

Q: For interviews, do I need to know everything here?

A: No, you don't need to know everything here to prepare for the interview.

What you are asked in an interview depends on variables such as:

  • How much experience you have
  • What your technical background is
  • What positions you are interviewing for
  • Which companies you are interviewing with
  • Luck

More experienced candidates are generally expected to know more about system design. Architects or team leads might be expected to know more than individual contributors. Top tech companies are likely to have one or more design interview rounds.

Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the following guide based on your timeline, experience, what positions you are interviewing for, and which companies you are interviewing with.

  • Short timeline - Aim for breadth with system design topics. Practice by solving some interview questions.
  • Medium timeline - Aim for breadth and some depth with system design topics. Practice by solving many interview questions.
  • Long timeline - Aim for breadth and more depth with system design topics. Practice by solving most interview questions.
Short Medium Long
Read through the System design topics to get a broad understanding of how systems work 👍 👍 👍
Read through a few articles in the Company engineering blogs for the companies you are interviewing with 👍 👍 👍
Read through a few Real world architectures 👍 👍 👍
Review How to approach a system design interview question 👍 👍 👍
Work through System design interview questions with solutions Some Many Most
Work through Object-oriented design interview questions with solutions Some Many Most
Review Additional system design interview questions Some Many Most

How to approach a system design interview question

How to tackle a system design interview question.

The system design interview is an open-ended conversation. You are expected to lead it.

You can use the following steps to guide the discussion. To help solidify this process, work through the System design interview questions with solutions section using the following steps.

Step 1: Outline use cases, constraints, and assumptions

Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.

  • Who is going to use it?
  • How are they going to use it?
  • How many users are there?
  • What does the system do?
  • What are the inputs and outputs of the system?
  • How much data do we expect to handle?
  • How many requests per second do we expect?
  • What is the expected read to write ratio?

Step 2: Create a high level design

Outline a high level design with all important components.

  • Sketch the main components and connections
  • Justify your ideas

Step 3: Design core components

Dive into details for each core component. For example, if you were asked to design a url shortening service, discuss:

  • Generating and storing a hash of the full url
    • MD5 and Base62
    • Hash collisions
    • SQL or NoSQL
    • Database schema
  • Translating a hashed url to the full url
    • Database lookup
  • API and object-oriented design

Step 4: Scale the design

Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability issues?

  • Load balancer
  • Horizontal scaling
  • Caching
  • Database sharding

Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using principles of scalable system design.

Back-of-the-envelope calculations

You might be asked to do some estimates by hand. Refer to the Appendix for the following resources:

Source(s) and further reading

Check out the following links to get a better idea of what to expect:

System design interview questions with solutions

Common system design interview questions with sample discussions, code, and diagrams.

Solutions linked to content in the solutions/ folder.

Question
Design Pastebin.com (or Bit.ly) Solution
Design the Twitter timeline and search (or Facebook feed and search) Solution
Design a web crawler Solution
Design Mint.com Solution
Design the data structures for a social network Solution
Design a key-value store for a search engine Solution
Design Amazon's sales ranking by category feature Solution
Design a system that scales to millions of users on AWS Solution
Add a system design question Contribute

Design Pastebin.com (or Bit.ly)

View exercise and solution

Design the Twitter timeline and search (or Facebook feed and search)

View exercise and solution

Design a web crawler

View exercise and solution

Design Mint.com

View exercise and solution

Design the data structures for a social network

View exercise and solution

Design a key-value store for a search engine

View exercise and solution

Design Amazon's sales ranking by category feature

View exercise and solution

Design a system that scales to millions of users on AWS

View exercise and solution

Object-oriented design interview questions with solutions

Common object-oriented design interview questions with sample discussions, code, and diagrams.

Solutions linked to content in the solutions/ folder.

Note: This section is under development

Question
Design a hash map Solution
Design a least recently used cache Solution
Design a call center Solution
Design a deck of cards Solution
Design a parking lot Solution
Design a chat server Solution
Design a circular array Contribute
Add an object-oriented design question Contribute

System design topics: start here

New to system design?

First, you'll need a basic understanding of common principles, learning about what they are, how they are used, and their pros and cons.

Step 1: Review the scalability video lecture

Scalability Lecture at Harvard

  • Topics covered:
    • Vertical scaling
    • Horizontal scaling
    • Caching
    • Load balancing
    • Database replication
    • Database partitioning

Step 2: Review the scalability article

Scalability

Next steps

Next, we'll look at high-level trade-offs:

  • Performance vs scalability
  • Latency vs throughput
  • Availability vs consistency

Keep in mind that everything is a trade-off.

Then we'll dive into more specific topics such as DNS, CDNs, and load balancers.

Performance vs scalability

A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.1

Another way to look at performance vs scalability:

  • If you have a performance problem, your system is slow for a single user.
  • If you have a scalability problem, your system is fast for a single user but slow under heavy load.

Source(s) and further reading

Latency vs throughput

Latency is the time to perform some action or to produce some result.

Throughput is the number of such actions or results per unit of time.

Generally, you should aim for maximal throughput with acceptable latency.

Source(s) and further reading

Availability vs consistency

CAP theorem


Source: CAP theorem revisited

In a distributed computer system, you can only support two of the following guarantees:

  • Consistency - Every read receives the most recent write or an error
  • Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
  • Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures

Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.

CP - consistency and partition tolerance

Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.

AP - availability and partition tolerance

Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.

AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.

Source(s) and further reading

Consistency patterns

With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most recent write or an error.

Weak consistency

After a write, reads may or may not see it. A best effort approach is taken.

This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.

Eventual consistency

After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.

This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.

Strong consistency

After a write, reads will see it. Data is replicated synchronously.

This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.

Source(s) and further reading

Availability patterns

There are two complementary patterns to support high availability: fail-over and replication.

Fail-over

Active-passive

With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.

The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.

Active-passive failover can also be referred to as master-slave failover.

Active-active

In active-active, both servers are managing traffic, spreading the load between them.

If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.

Active-active failover can also be referred to as master-master failover.

Disadvantage(s): failover

  • Fail-over adds more hardware and additional complexity.
  • There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.

Replication

Master-slave and master-master

This topic is further discussed in the Database section:

Availability in numbers

Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.

99.9% availability - three 9s

Duration Acceptable downtime
Downtime per year 8h 45min 57s
Downtime per month 43m 49.7s
Downtime per week 10m 4.8s
Downtime per day 1m 26.4s

99.99% availability - four 9s

Duration Acceptable downtime
Downtime per year 52min 35.7s
Downtime per month 4m 23s
Downtime per week 1m 5s
Downtime per day 8.6s

Availability in parallel vs in sequence

If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.

In sequence

Overall availability decreases when two components with availability < 100% are in sequence:

Availability (Total) = Availability (Foo) * Availability (Bar)

If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%.

In parallel

Overall availability increases when two components with availability < 100% are in parallel:

Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))

If both Foo and Bar each had 99.9% availability, their total availability in parallel would be 99.9999%.

Domain name system


Source: DNS security presentation

A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.

DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).

  • NS record (name server) - Specifies the DNS servers for your domain/subdomain.
  • MX record (mail exchange) - Specifies the mail servers for accepting messages.
  • A record (address) - Points a name to an IP address.
  • CNAME (canonical) - Points a name to another name or CNAME (example.com to www.example.com) or to an A record.

Services such as CloudFlare and Route 53 provide managed DNS services. Some DNS services can route traffic through various methods:

Disadvantage(s): DNS

  • Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
  • DNS server management could be complex and is generally managed by governments, ISPs, and large companies.
  • DNS services have recently come under DDoS attack, preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).

Source(s) and further reading

Content delivery network


Source: Why use a CDN

A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.

Serving content from CDNs can significantly improve performance in two ways:

  • Users receive content from data centers close to them
  • Your servers do not have to serve requests that the CDN fulfills

Push CDNs

Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.

Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.

Pull CDNs

Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.

A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.

Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.

Disadvantage(s): CDN

  • CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN.
  • Content might be stale if it is updated before the TTL expires it.
  • CDNs require changing URLs for static content to point to the CDN.

Source(s) and further reading

Load balancer


Source: Scalable system design patterns

Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:

  • Preventing requests from going to unhealthy servers
  • Preventing overloading resources
  • Helping to eliminate a single point of failure

Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.

Additional benefits include:

  • SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
  • Session persistence - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions

To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.

Load balancers can route traffic based on various metrics, including:

Layer 4 load balancing

Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).

Layer 7 load balancing

Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.

At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.

Horizontal scaling

Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.

Disadvantage(s): horizontal scaling

  • Scaling horizontally introduces complexity and involves cloning servers
    • Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
    • Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
  • Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out

Disadvantage(s): load balancer

  • The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
  • Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
  • A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.

Source(s) and further reading

Reverse proxy (web server)


Source: Wikipedia

A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.

Additional benefits include:

  • Increased security - Hide information about backend servers, blacklist IPs, limit number of connections per client
  • Increased scalability and flexibility - Clients only see the reverse proxy's IP, allowing you to scale servers or change their configuration
  • SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
  • Compression - Compress server responses
  • Caching - Return the response for cached requests
  • Static content - Serve static content directly
    • HTML/CSS/JS
    • Photos
    • Videos
    • Etc

Load balancer vs reverse proxy

  • Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.
  • Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
  • Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.

Disadvantage(s): reverse proxy

  • Introducing a reverse proxy results in increased complexity.
  • A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.

Source(s) and further reading

Application layer


Source: Intro to architecting systems for scale

Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.

Workers in the application layer also help enable asynchronism.

Microservices

Related to this discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1

Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.

Service Discovery

Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.

Disadvantage(s): application layer

  • Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
  • Microservices can add complexity in terms of deployments and operations.

Source(s) and further reading

Database


Source: Scaling up to your first 10 million users

Relational database management system (RDBMS)

A relational database like SQL is a collection of data items organized in tables.

ACID is a set of properties of relational database transactions.

  • Atomicity - Each transaction is all or nothing
  • Consistency - Any transaction will bring the database from one valid state to another
  • Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
  • Durability - Once a transaction has been committed, it will remain so

There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.

Master-slave replication

The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.


Source: Scalability, availability, stability, patterns

Disadvantage(s): master-slave replication
  • Additional logic is needed to promote a slave to a master.
  • See Disadvantage(s): replication for points related to both master-slave and master-master.

Master-master replication

Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.


Source: Scalability, availability, stability, patterns

Disadvantage(s): master-master replication
  • You'll need a load balancer or you'll need to make changes to your application logic to determine where to write.
  • Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
  • Conflict resolution comes more into play as more write nodes are added and as latency increases.
  • See Disadvantage(s): replication for points related to both master-slave and master-master.
Disadvantage(s): replication
  • There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
  • Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can't do as many reads.
  • The more read slaves, the more you have to replicate, which leads to greater replication lag.
  • On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
  • Replication adds more hardware and additional complexity.
Source(s) and further reading: replication

Federation


Source: Scaling up to your first 10 million users

Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.

Disadvantage(s): federation
  • Federation is not effective if your schema requires huge functions or tables.
  • You'll need to update your application logic to determine which database to read and write.
  • Joining data from two databases is more complex with a server link.
  • Federation adds more hardware and additional complexity.
Source(s) and further reading: federation

Sharding


Source: Scalability, availability, stability, patterns

Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.

Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.

Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.

Disadvantage(s): sharding
  • You'll need to update your application logic to work with shards, which could result in complex SQL queries.
  • Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
    • Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
  • Joining data from multiple shards is more complex.
  • Sharding adds more hardware and additional complexity.
Source(s) and further reading: sharding

Denormalization

Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.

Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.

In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database join can be very expensive, spending a significant amount of time on disk operations.

Disadvantage(s): denormalization
  • Data is duplicated.
  • Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
  • A denormalized database under heavy write load might perform worse than its normalized counterpart.
Source(s) and further reading: denormalization

SQL tuning

SQL tuning is a broad topic and many books have been written as reference.

It's important to benchmark and profile to simulate and uncover bottlenecks.

  • Benchmark - Simulate high-load situations with tools such as ab.
  • Profile - Enable tools such as the slow query log to help track performance issues.

Benchmarking and profiling might point you to the following optimizations.

Tighten up the schema
  • MySQL dumps to disk in contiguous blocks for fast access.
  • Use CHAR instead of VARCHAR for fixed-length fields.
    • CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
  • Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
  • Use INT for larger numbers up to 2^32 or 4 billion.
  • Use DECIMAL for currency to avoid floating point representation errors.
  • Avoid storing large BLOBS, store the location of where to get the object instead.
  • VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
  • Set the NOT NULL constraint where applicable to improve search performance.
Use good indices
  • Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
  • Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
  • Placing an index can keep the data in memory, requiring more space.
  • Writes could also be slower since the index also needs to be updated.
  • When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
Avoid expensive joins
Partition tables
  • Break up a table by putting hot spots in a separate table to help keep it in memory.
Tune the query cache
Source(s) and further reading: SQL tuning

NoSQL

NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.

BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.

  • Basically available - the system guarantees availability.
  • Soft state - the state of the system may change over time, even without input.
  • Eventual consistency - the system will become consistent over a period of time, given that the system doesn't receive input during that period.

In addition to choosing between SQL or NoSQL, it is helpful to understand which type of NoSQL database best fits your use case(s). We'll review key-value stores, document stores, wide column stores, and graph databases in the next section.

Key-value store

Abstraction: hash table

A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.

Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.

A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.

Source(s) and further reading: key-value store

Document store

Abstraction: key-value store with documents stored as values

A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.

Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.

Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.

Document stores provide high flexibility and are often used for working with occasionally changing data.

Source(s) and further reading: document store

Wide column store


Source: SQL & NoSQL, a brief history

Abstraction: nested map ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>

A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.

Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.

Wide column stores offer high availability and high scalability. They are often used for very large data sets.

Source(s) and further reading: wide column store

Graph database


Source: Graph database

Abstraction: graph

In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.

Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.

Source(s) and further reading: graph

Source(s) and further reading: NoSQL

SQL or NoSQL


Source: Transitioning from RDBMS to NoSQL

Reasons for SQL:

  • Structured data
  • Strict schema
  • Relational data
  • Need for complex joins
  • Transactions
  • Clear patterns for scaling
  • More established: developers, community, code, tools, etc
  • Lookups by index are very fast

Reasons for NoSQL:

  • Semi-structured data
  • Dynamic or flexible schema
  • Non-relational data
  • No need for complex joins
  • Store many TB (or PB) of data
  • Very data intensive workload
  • Very high throughput for IOPS

Sample data well-suited for NoSQL:

  • Rapid ingest of clickstream and log data
  • Leaderboard or scoring data
  • Temporary data, such as a shopping cart
  • Frequently accessed ('hot') tables
  • Metadata/lookup tables
Source(s) and further reading: SQL or NoSQL

Cache


Source: Scalable system design patterns

Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.

Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.

Client caching

Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.

CDN caching

CDNs are considered a type of cache.

Web server caching

Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.

Database caching

Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.

Application caching

In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.

Redis has the following additional features:

  • Persistence option
  • Built-in data structures such as sorted sets and lists

There are multiple levels you can cache that fall into two general categories: database queries and objects:

  • Row level
  • Query-level
  • Fully-formed serializable objects
  • Fully-rendered HTML

Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.

Caching at the database query level

Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:

  • Hard to delete a cached result with complex queries
  • If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell

Caching at the object level

See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):

  • Remove the object from cache if its underlying data has changed
  • Allows for asynchronous processing: workers assemble objects by consuming the latest cached object

Suggestions of what to cache:

  • User sessions
  • Fully rendered web pages
  • Activity streams
  • User graph data

When to update the cache

Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.

Cache-aside


Source: From cache to in-memory data grid

The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:

  • Look for entry in cache, resulting in a cache miss
  • Load entry from the database
  • Add entry to cache
  • Return entry
def get_user(self, user_id):
    user = cache.get("user.{0}", user_id)
    if user is None:
        user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
        if user is not None:
            key = "user.{0}".format(user_id)
            cache.set(key, json.dumps(user))
    return user

Memcached is generally used in this manner.

Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.

Disadvantage(s): cache-aside
  • Each cache miss results in three trips, which can cause a noticeable delay.
  • Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
  • When a node fails, it is replaced by a new, empty node, increasing latency.

Write-through


Source: Scalability, availability, stability, patterns

The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:

  • Application adds/updates entry in cache
  • Cache synchronously writes entry to data store
  • Return

Application code:

set_user(12345, {"foo":"bar"})

Cache code:

def set_user(user_id, values):
    user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
    cache.set(user_id, user)

Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.

Disadvantage(s): write through
  • When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
  • Most data written might never be read, which can be minimized with a TTL.

Write-behind (write-back)


Source: Scalability, availability, stability, patterns

In write-behind, the application does the following:

  • Add/update entry in cache
  • Asynchronously write entry to the data store, improving write performance
Disadvantage(s): write-behind
  • There could be data loss if the cache goes down prior to its contents hitting the data store.
  • It is more complex to implement write-behind than it is to implement cache-aside or write-through.

Refresh-ahead


Source: From cache to in-memory data grid

You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.

Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.

Disadvantage(s): refresh-ahead
  • Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.

Disadvantage(s): cache

  • Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.
  • Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
  • Need to make application changes such as adding Redis or memcached.

Source(s) and further reading

Asynchronism


Source: Intro to architecting systems for scale

Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.

Message queues

Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:

  • An application publishes a job to the queue, then notifies the user of job status
  • A worker picks up the job from the queue, processes it, then signals the job is complete

The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.

Redis is useful as a simple message broker but messages can be lost.

RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.

Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.

Task queues

Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.

Celery has support for scheduling and primarily has python support.

Back pressure

If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with exponential backoff.

Disadvantage(s): asynchronism

  • Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.

Source(s) and further reading

Communication


Source: OSI 7 layer model

Hypertext transfer protocol (HTTP)

HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.

A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:

Verb Description Idempotent* Safe Cacheable
GET Reads a resource Yes Yes Yes
POST Creates a resource or trigger a process that handles data No No Yes if response contains freshness info
PUT Creates or replace a resource Yes No No
PATCH Partially updates a resource No No Yes if response contains freshness info
DELETE Deletes a resource Yes No No

*Can be called many times without different outcomes.

HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.

Source(s) and further reading: HTTP

Transmission control protocol (TCP)


Source: How to make a multiplayer game

TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through:

If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.

To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.

TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.

Use TCP over UDP when:

  • You need all of the data to arrive intact
  • You want to automatically make a best estimate use of the network throughput

User datagram protocol (UDP)


Source: How to make a multiplayer game

UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.

UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.

UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.

Use UDP over TCP when:

  • You need the lowest latency
  • Late data is worse than loss of data
  • You want to implement your own error correction

Source(s) and further reading: TCP and UDP

Remote procedure call (RPC)


Source: Crack the system design interview

In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include Protobuf, Thrift, and Avro.

RPC is a request-response protocol:

  • Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.
  • Client stub procedure - Marshals (packs) procedure id and arguments into a request message.
  • Client communication module - OS sends the message from the client to the server.
  • Server communication module - OS passes the incoming packets to the server stub procedure.
  • Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.
  • The server response repeats the steps above in reverse order.

Sample RPC calls:

GET /someoperation?data=anId

POST /anotheroperation
{
  "data":"anId";
  "anotherdata": "another value"
}

RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.

Choose a native library (aka SDK) when:

  • You know your target platform.
  • You want to control how your "logic" is accessed.
  • You want to control how error control happens off your library.
  • Performance and end user experience is your primary concern.

HTTP APIs following REST tend to be used more often for public APIs.

Disadvantage(s): RPC

  • RPC clients become tightly coupled to the service implementation.
  • A new API must be defined for every new operation or use case.
  • It can be difficult to debug RPC.
  • You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure RPC calls are properly cached on caching servers such as Squid.

Representational state transfer (REST)

REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.

There are four qualities of a RESTful interface:

  • Identify resources (URI in HTTP) - use the same URI regardless of any operation.
  • Change with representations (Verbs in HTTP) - use verbs, headers, and body.
  • Self-descriptive error message (status response in HTTP) - Use status codes, don't reinvent the wheel.
  • HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser.

Sample REST calls:

GET /someresources/anId

PUT /someresources/anId
{"anotherdata": "another value"}

REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.

Disadvantage(s): REST

  • With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
  • REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
  • Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
  • Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.

RPC and REST calls comparison

Operation RPC REST
Signup POST /signup POST /persons
Resign POST /resign
{
"personid": "1234"
}
DELETE /persons/1234
Read a person GET /readPerson?personid=1234 GET /persons/1234
Read a person’s items list GET /readUsersItemsList?personid=1234 GET /persons/1234/items
Add an item to a person’s items POST /addItemToUsersItemsList
{
"personid": "1234";
"itemid": "456"
}
POST /persons/1234/items
{
"itemid": "456"
}
Update an item POST /modifyItem
{
"itemid": "456";
"key": "value"
}
PUT /items/456
{
"key": "value"
}
Delete an item POST /removeItem
{
"itemid": "456"
}
DELETE /items/456

Source: Do you really know why you prefer REST over RPC

Source(s) and further reading: REST and RPC

Security

This section could use some updates. Consider contributing!

Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics:

  • Encrypt in transit and at rest.
  • Sanitize all user inputs or any input parameters exposed to user to prevent XSS and SQL injection.
  • Use parameterized queries to prevent SQL injection.
  • Use the principle of least privilege.

Source(s) and further reading

Appendix

You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers of two table and Latency numbers every programmer should know are handy references.

Powers of two table

Power           Exact Value         Approx Value        Bytes
---------------------------------------------------------------
7                             128
8                             256
10                           1024   1 thousand           1 KB
16                         65,536                       64 KB
20                      1,048,576   1 million            1 MB
30                  1,073,741,824   1 billion            1 GB
32                  4,294,967,296                        4 GB
40              1,099,511,627,776   1 trillion           1 TB

Source(s) and further reading

Latency numbers every programmer should know

Latency Comparison Numbers
--------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy            10,000   ns       10 us
Send 1 KB bytes over 1 Gbps network     10,000   ns       10 us
Read 4 KB randomly from SSD*           150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
HDD seek                            10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps  10,000,000   ns   10,000 us   10 ms  40x memory, 10X SSD
Read 1 MB sequentially from HDD     30,000,000   ns   30,000 us   30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Handy metrics based on numbers above:

  • Read sequentially from HDD at 30 MB/s
  • Read sequentially from 1 Gbps Ethernet at 100 MB/s
  • Read sequentially from SSD at 1 GB/s
  • Read sequentially from main memory at 4 GB/s
  • 6-7 world-wide round trips per second
  • 2,000 round trips per second within a data center

Latency numbers visualized

Source(s) and further reading

Additional system design interview questions

Common system design interview questions, with links to resources on how to solve each.

Question Reference(s)
Design a file sync service like Dropbox youtube.com
Design a search engine like Google queue.acm.org
stackexchange.com
ardendertat.com
stanford.edu
Design a scalable web crawler like Google quora.com
Design Google docs code.google.com
neil.fraser.name
Design a key-value store like Redis slideshare.net
Design a cache system like Memcached slideshare.net
Design a recommendation system like Amazon's hulu.com
ijcai13.org
Design a tinyurl system like Bitly n00tc0d3r.blogspot.com
Design a chat app like WhatsApp highscalability.com
Design a picture sharing system like Instagram highscalability.com
highscalability.com
Design the Facebook news feed function quora.com
quora.com
slideshare.net
Design the Facebook timeline function facebook.com
highscalability.com
Design the Facebook chat function erlang-factory.com
facebook.com
Design a graph search function like Facebook's facebook.com
facebook.com
facebook.com
Design a content delivery network like CloudFlare figshare.com
Design a trending topic system like Twitter's michael-noll.com
snikolov .wordpress.com
Design a random ID generation system blog.twitter.com
github.com
Return the top k requests during a time interval cs.ucsb.edu
wpi.edu
Design a system that serves data from multiple data centers highscalability.com
Design an online multiplayer card game indieflashblog.com
buildnewgames.com
Design a garbage collection system stuffwithstuff.com
washington.edu
Design an API rate limiter https://stripe.com/blog/
Design a Stock Exchange (like NASDAQ or Binance) Jane Street
Golang Implementation
Go Implementation
Add a system design question Contribute

Real world architectures

Articles on how real world systems are designed.


Source: Twitter timelines at scale

Don't focus on nitty gritty details for the following articles, instead:

  • Identify shared principles, common technologies, and patterns within these articles
  • Study what problems are solved by each component, where it works, where it doesn't
  • Review the lessons learned
Type System Reference(s)
Data processing MapReduce - Distributed data processing from Google research.google.com
Data processing Spark - Distributed data processing from Databricks slideshare.net
Data processing Storm - Distributed data processing from Twitter slideshare.net
Data store Bigtable - Distributed column-oriented database from Google harvard.edu
Data store HBase - Open source implementation of Bigtable slideshare.net
Data store Cassandra - Distributed column-oriented database from Facebook slideshare.net
Data store DynamoDB - Document-oriented database from Amazon harvard.edu
Data store MongoDB - Document-oriented database slideshare.net
Data store Spanner - Globally-distributed database from Google research.google.com
Data store Memcached - Distributed memory caching system slideshare.net
Data store Redis - Distributed memory caching system with persistence and value types slideshare.net
File system Google File System (GFS) - Distributed file system research.google.com
File system Hadoop File System (HDFS) - Open source implementation of GFS apache.org
Misc Chubby - Lock service for loosely-coupled distributed systems from Google research.google.com
Misc Dapper - Distributed systems tracing infrastructure research.google.com
Misc Kafka - Pub/sub message queue from LinkedIn slideshare.net
Misc Zookeeper - Centralized infrastructure and services enabling synchronization slideshare.net
Add an architecture Contribute

Company architectures

Company Reference(s)
Amazon Amazon architecture
Cinchcast Producing 1,500 hours of audio every day
DataSift Realtime datamining At 120,000 tweets per second
Dropbox How we've scaled Dropbox
ESPN Operating At 100,000 duh nuh nuhs per second
Google Google architecture
Instagram 14 million users, terabytes of photos
What powers Instagram
Justin.tv Justin.Tv's live video broadcasting architecture
Facebook Scaling memcached at Facebook
TAO: Facebook’s distributed data store for the social graph
Facebook’s photo storage
How Facebook Live Streams To 800,000 Simultaneous Viewers
Flickr Flickr architecture
Mailbox From 0 to one million users in 6 weeks
Netflix A 360 Degree View Of The Entire Netflix Stack
Netflix: What Happens When You Press Play?
Pinterest From 0 To 10s of billions of page views a month
18 million visitors, 10x growth, 12 employees
Playfish 50 million monthly users and growing
PlentyOfFish PlentyOfFish architecture
Salesforce How they handle 1.3 billion transactions a day
Stack Overflow Stack Overflow architecture
TripAdvisor 40M visitors, 200M dynamic page views, 30TB data
Tumblr 15 billion page views a month
Twitter Making Twitter 10000 percent faster
Storing 250 million tweets a day using MySQL
150M active users, 300K QPS, a 22 MB/S firehose
Timelines at scale
Big and small data at Twitter
Operations at Twitter: scaling beyond 100 million users
How Twitter Handles 3,000 Images Per Second
Uber How Uber scales their real-time market platform
Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories
WhatsApp The WhatsApp architecture Facebook bought for $19 billion
YouTube YouTube scalability
YouTube architecture

Company engineering blogs

Architectures for companies you are interviewing with.

Questions you encounter might be from the same domain.

Source(s) and further reading

Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:

Under development

Interested in adding a section or helping complete one in-progress? Contribute!

  • Distributed computing with MapReduce
  • Consistent hashing
  • Scatter gather
  • Contribute

Credits

Credits and sources are provided throughout this repo.

Special thanks to:

Contact info

Feel free to contact me to discuss any issues, questions, or comments.

My contact info can be found on my GitHub page.

License

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).

Copyright 2017 Donne Martin

Creative Commons Attribution 4.0 International License (CC BY 4.0)

http://creativecommons.org/licenses/by/4.0/
THUDM/CogVLM
1 week, 6 days ago

a state-of-the-art-level open visual language model | 多模态预训练模型


CogVLM

📖 Paper(论文)

🌐 web demo(测试网址)

🔥 News: CogVLM bilingual version is available online! Welcome to try it out!

🔥 News: CogVLM中英双语版正式上线了!欢迎体验!

🔥 News: We are currently preparing to open-source a more powerful model with rich chart and document understanding capabilities. It has achieved a score of 81 on DocVQA, so stay tuned for its release!

中文版README

Introduction

  • CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.

  • CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images.

Examples

  • CogVLM can accurately describe images in details with very few hallucinations.

    Click for comparison with LLAVA-1.5 and MiniGPT-4.


  • CogVLM can understand and answer various types of questions, and has a visual grounding version.

  • CogVLM sometimes captures more detailed content than GPT-4V(ision).

Click to expand more examples.

Method

CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details.

Get Started

We support two GUIs for model inference, web demo and CLI. If you want to use it in your python code, it is easy to modify the CLI scripts for your case.

First, we need to install the dependencies.

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Hardware requirement

  • Model Inference: 1 * A100(80G) or 2 * RTX 3090(24G).
  • Finetuning: 4 * A100(80G) [Recommend] or 8* RTX 3090(24G).

Web Demo

We also offer a local web demo based on Gradio. First, install Gradio by running: pip install gradio. Then download and enter this repository and run web_demo.py. See the next section for detailed usage:

python web_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python web_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16

The GUI of the web demo looks like:

CLI

We open-source different checkpoints for different downstreaming tasks:

  • cogvlm-chat The model after SFT for alignment, which supports chat like GPT-4V.
  • cogvlm-base-224 The original checkpoint after text-image pretraining.
  • cogvlm-base-490 The finetuned version on 490px resolution from cogvlm-base-224. The finetuning data includes the training sets of VQA datasets.
  • cogvlm-grounding-generalist. This checkpoint supports different visual grounding tasks, e.g. REC, Grounding Captioning, etc.

Run CLI demo via:

python cli_demo.py --from_pretrained cogvlm-base-224 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-base-490 --version base --english --bf16 --no_prompt
python cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
python cli_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16

The program will automatically download the sat model and interact in the command line. You can generate replies by entering instructions and pressing enter. Enter clear to clear the conversation history and stop to stop the program.

Multi-GPU inference

We also support model parallel inference, which splits model to multiple (2/4/8) GPUs. --nproc-per-node=[n] in the following command controls the number of used GPUs.

torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16

Note:

  • If you have trouble in accessing huggingface.co, you can add --local_tokenizer /path/to/vicuna-7b-v1.5 to load the tokenizer.
  • If you have trouble in automatically downloading model with 🔨SAT, try downloading from 🤖modelscope or 🤗huggingface manually.
  • Download model using 🔨SAT, the model will be saved to the default location ~/.sat_models. Change the default location by setting the environment variable SAT_HOME. For example, if you want to save the model to /path/to/my/models, you can run export SAT_HOME=/path/to/my/models before running the python command.

The program provides the following hyperparameters to control the generation process:

usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english]

optional arguments:
  -h, --help            show this help message and exit
  --max_length MAX_LENGTH
                        max length of the total sequence
  --top_p TOP_P         top p for nucleus sampling
  --top_k TOP_K         top k for top k sampling
  --temperature TEMPERATURE
                        temperature for sampling
  --english             only output English

Finetuning

You may want to use CogVLM in your own task, which needs a different output style or domain knowledge. We here provide a finetuning example for Captcha Recognition.

  1. Start by downloading the Captcha Images dataset. Once downloaded, extract the contents of the ZIP file.

  2. To create a train/validation/test split in the ratio of 80/5/15, execute the following:

    python scripts/split_dataset.py
    
  3. Start the fine-tuning process with this command:

    bash scripts/finetune_(224/490)_lora.sh
    
  4. Merge the model to model_parallel_size=1: (replace the 4 below with your training MP_SIZE)

    torchrun --standalone --nnodes=1 --nproc-per-node=4 merge_model.py --version base --bf16 --from_pretrained ./checkpoints/merged_lora_(224/490)
    
  5. Evaluate the performance of your model.

    bash scripts/evaluate_(224/490).sh
    

It is recommended to use the 490px version. However, if you have limited GPU resources (such as only one node with 8* RTX 3090), you can try 224px version with model parallel.

The anticipated result of this script is around 95% accuracy on test set.

It is worth noting that the fine-tuning examples only tune limited parameters. (Expert only) If you want to get >98% accuracy, you need to increase the trainable parameters in finetune_demo.py.

License

The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License.

Citation & Acknowledgements

If you find our work helpful, please consider citing the following papers

@article{wang2023cogvlm,
      title={CogVLM: Visual Expert for Pretrained Language Models}, 
      author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
      year={2023},
      eprint={2311.03079},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.

cpacker/MemGPT
1 week, 6 days ago

Teaching LLMs memory management for unbounded context 📚🦙


MemGPT

Try out our MemGPT chatbot on Discord!

⭐ NEW: You can now run MemGPT with local LLMs and AutoGen! ⭐

🤖 Create perpetual chatbots with self-editing memory!


🗃️ Chat with your data - talk to your local files or SQL database!

Quick setup

Join Discord and message the MemGPT bot (in the #memgpt channel). Then run the following commands (messaged to "MemGPT Bot"):

  • /profile (to create your profile)
  • /key (to enter your OpenAI key)
  • /create (to create a MemGPT chatbot)

Make sure your privacy settings on this server are open so that MemGPT Bot can DM you:
MemGPT → Privacy Settings → Direct Messages set to ON

You can see the full list of available commands when you enter / into the message box.

What is MemGPT?

Memory-GPT (or MemGPT in short) is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. Learn more about MemGPT in our paper.

Running MemGPT locally

Install MemGPT:

pip install pymemgpt

Add your OpenAI API key to your environment:


export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
set OPENAI_API_KEY=YOUR_API_KEY # on Windows
$Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)

Configure default setting for MemGPT by running:

memgpt configure

Now, you can run MemGPT with:

memgpt run

You can run the following commands in the MemGPT CLI prompt:

  • /exit: Exit the CLI
  • /attach: Attach a loaded data source to the agent
  • /save: Save a checkpoint of the current agent/conversation state
  • /dump: View the current message log (see the contents of main context)
  • /dump <count>: View the last messages (all if is omitted)
  • /memory: Print the current contents of agent memory
  • /pop: Undo the last message in the conversation
  • /pop <count>: Undo the last messages in the conversation. It defaults to 3, which usually is one turn around in the conversation
  • /retry: Pops the last answer and tries to get another one
  • /rethink <text>: Will replace the inner dialog of the last assistant message with the to help shaping the conversation
  • /rewrite: Will replace the last assistant answer with the given text to correct or force the answer
  • /heartbeat: Send a heartbeat system message to the agent
  • /memorywarning: Send a memory warning system message to the agent

Once you exit the CLI with /exit, you can resume chatting with the same agent by specifying the agent name in memgpt run --agent <NAME>.

Documentation

See full documentation at: https://memgpt.readthedocs.io/

Installing from source

To install MemGPT from source, start by cloning the repo:

git clone git@github.com:cpacker/MemGPT.git

Then navigate to the main MemGPT directory, and do:

pip install -e .

Now, you should be able to run memgpt from the command-line using the downloaded source code.

If you are having dependency issues using pip install -e ., we recommend you install the package using Poetry (see below). Installing MemGPT from source using Poetry will ensure that you are using exact package versions that have been tested for the production build.

Installing from source (using Poetry)

First, install Poetry using the official instructions here.

Then, you can install MemGPT from source with:

git clone git@github.com:cpacker/MemGPT.git
poetry shell
poetry install

Support

For issues and feature requests, please open a GitHub issue or message us on our #support channel on Discord

Datasets

Datasets used in our paper can be downloaded at Hugging Face.

🚀 Project Roadmap

  • Release MemGPT Discord bot demo (perpetual chatbot)
  • Add additional workflows (load SQL/text into MemGPT external context)
  • Integration tests
  • Integrate with AutoGen (discussion)
  • Add official gpt-3.5-turbo support (discussion)
  • CLI UI improvements (issue)
  • Add support for other LLM backends (issue, discussion)
  • Release MemGPT family of open models (eg finetuned Mistral) (discussion)
A Battle of Async Titans: Django ORM Async vs. SQLAlchemy Async
1 week, 5 days ago

When you enter the world of Python, you will hear that many developers love Django ORM, and others love SQLAlchemy. Each of those groups will tell you to your face why you have to choose their loved library, and if we add the async part of the programming, they will brag about the capacity of […]

The post A Battle of Async Titans: Django ORM Async vs. SQLAlchemy Async appeared first on Distillery.

Does someone need to be a good manager to give good management advice?
1 week, 6 days ago
In a management Slack I’m in, someone responded to a list of commonly-recommended management books by asking, “are these people good managers though?” It’s a fair question! But it’s not quite so simple.
Parse Inbound Email - Building SaaS with Python and Django #175
1 week, 6 days ago
In this episode, we switched to the inbound side and parsed an email to transform it into a journal entry. This caused us to look into the dateutil library and look at Python’s standard email module to use EmailMessage.
How to Build Trust
1 week, 6 days ago
What are the major management behaviors that can help build trust? Management books often cover the importance of trust, but abstractly. There’s precious little writing about the nuts and bolts, the day-to-day tasks of trust-building. That’s the gap I’d like to try to fill with this article.
Amersfoort (NL) python meetup
1 week, 6 days ago

The first "pyutrecht" meetup in Amersfoort in the Netherlands. (Amersfoort is not the city of Utrecht, but it is in the similarly named province of Utrecht).

I gave a talk myself about being more of a proper programmer to your own laptop setup. Have a git repo with a README explaining which programs you installed. An install script or makefile for installing certain tools. "Dotfiles" for storing your config in git. Etc. I haven't made a summary of my own talk. Here are the other three:

An introduction to web scraping - William Lacerda

William works at deliverect, the host of the meeting. Webscraping means extracting data from a website and parsing it into a more useful format. Like translating a list of restaurants on a

There's a difference with web crawling: that is following links and trying to download all the pages on a website.

Important: robots.txt. As a crawler or scraper you're supposed to read it as it tells you which user agents are allowed and which areas of the website are off-limits (or not useful).

Another useful file that is often available: /sitemap.xml. A list of URLs in the site that the site thinks are useful for scraping or crawling.

A handy trick: looking at the network tab when browsing the website. Are there any internal APIs that the javascript frontend uses to populate the page? Sometimes they are blocked from easy scraping or they're difficult to access due to creative headers or authentication or cookies or session IDs.

A tip: beautifulsoup, a python library for extracting neat, structured content from an otherwise messy html page.

selenium is an alternative as it behaves much more like a regular webbrowser. So you can "click" a "next" button a couple of times in order to get a full list of items. Because selenium behaves like a real webbrowser, things like cookies and IDs in query parameters and headers just work. That makes it easier to work around many kinds of basic protection.

MicroPython - Wouter van Ooijen

A microcontroller is a combination of cpu, memory and some interfaces to external ports. https://micropython.org is a version of python for such low-power devices.

He demoed python's prompt running on a raspberrypi micro connected via microUSB. And of course the mandatory lets-blink-the-onboard-LED programs. And then some other demoes with more leds and servos. Nice.

A big advantage of micropython is that it doesn't care what processor you have. With C/C++ you specifically have to compile for the right kind of processor. With micropython you can just run your code anywhere.

You can use micropython in three ways:

  • As .py sources, uploaded to the microcontroller.
  • As pre-compiled .mpy code, also uploaded.
  • As frozen .mpy included in the images

He showed a couple of possible target microcontrollers. A note to myself about the ESP8266: limited support, use .mpy. I think I have a few of those at home for should-test-it-at-some-time :-) Some examples: Pi RP2040, ESP32, Teensy 4.1.

A problem: RAM is scarce in such chips and python is hungry... You can do some tricks like on-demand loading. Watch out when using an LCD graphic display, that takes 150kb easily.

You have to watch out for the timing requirements of what you want to do. Steering a servo is fine, but "neopixel" leds for instance needs a higher frequency of signals than micropython is capable of on such a microcontroller. If you use a C library for it, it works (he showed a demo).

GraphQL in python? meet strawberry - Erik Wrede

Erik works as maintainer on the Graphene and the strawberry-GraphQL projects.

Graphql is a query language for APIs. It is an alternative to the well-known REST method. With REST you often have to do multiple requests to get all the data you have. And the answers will often give more information than you actually need.

With graphql, you always start with a graphql schema. You can compare it a bit to an openapi document. The graphql schema specifies what you can request ("a Meetup has a name, description, list of talks, etc").

An actual query specifies what you want to get back as response. You can omit fields from the schema that you don't need. If you don't need "description", you leave it out. If you want to dive deeper into certain objects, you specify their fields.

Strawberry is a graphql framework. It has integrations for django, sqlalchemy, pydantic and more. The schemas are defined with classes annotated with @strawberry.type and fields with python type hints. (It looked neat!)

He showed a live demo, including the browser-based query interface bundled with graphql.

Note: strawberry is the more modern project (type hints and so) and will later have all the functionality of graphene. So if strawberry's functionality is enough, you should use that one.

Why is Django's (Unofficial) Mascot a Pony?
2 weeks, 1 day ago
Officially, the [Django logo](https://www.djangoproject.com/community/logos/) is a text-only combination of green and white. But unofficially, the Django mascot is …
2024 DSF Board Candidates
2 weeks, 1 day ago
Thank you to the twelve individuals who have chosen to stand for election. This page contains their submitted candidate statements. Our deepest gratitude goes to our departing board member, Aaron Bassett, for your contributions and commitment to the Django community. Those eligible to vote in this election will receive information on how to vote shortly. Please check for an email with the subject line "2024 DSF Board Voting".
Chris Achinga Mombasa, Kenya

My Software development career was highly influenced by developer communities. Participating in tech meet-ups and events, notably DjangoCon Africa, has not only expanded my technical skills but also shaped my approach to both personal and professional growth. This experience has motivated me to seek a position on the Django Software Foundation Board, especially after the talks from Anna Makarudze on Navigating the Open-Source World as a Minority, that highlighted the challenges of organising events that benefits African communities As an advocate for African and minority communities within the tech ecosystem, I aspire to bring a unique and necessary perspective to the DSF Board. My commitment to volunteering and giving back to the community aligns perfectly with the ethos of the Django community. My experiences have taught me the value of dedicated community organizers who selflessly share resources and knowledge, fostering an environment where developers at all levels can thrive.

Joining the DSF Board would enable me to champion the interests of young and emerging developers globally, particularly from underrepresented regions. I aim to ensure that everyone, regardless of their background, has equitable access to the opportunities that Django, both as a community and a web development framework, can offer.

In my role with a Non-Governmental Organization aiding youth groups along the Kenyan Coast(Swahilipot Hub Foundation), I've garnered experience in community engagement and utilizing technology for social good. This experience has been instrumental in creating Django-based platforms that empower community self-management. My presence on the DSF Board would not only represent these communities but also allow me to serve as a mentor and technical advisor.

I am eager to contribute my insights and leadership to the DSF Board. With your support, I hope to make a meaningful impact, fostering an inclusive and dynamic environment where every developer can achieve their full potential.


David Vaz Porto, Portugal

Software developer for over 20 years, fell in love with django almost at the beginning of his journey 2007, version 0.96. He loves Django and Python so much he has been bringing developers to the community since then, ended up starting his consultancy firm around these technologies.

During DjangoCon Europe 2019 at Copenhagen he decided to take the next step helping the community, proposing to organize DjangoCon Europe 2020 in Portugal. He got more than he bargained for, ending up co-organising the first virtual-only DjangoCon Europe, repeating in 2021, and finally a hybrid DjangoCon Europe in 2022. His effort, together with the team around him, was rewarded with success, the 2022 edition had record breaking attendees with 500+ in person and 200+ online. To keep things going he is also co-organising DjangoCon Europe in 2024 in Spain Vigo, hoping to bring the Spanish community closer.

David is also contributing to the Portuguese Python Community, starting in 2022 the very first PyCon Portugal. His drive is to bring The Portuguese community forward, with a different city every year to increase the reach of the conference. The first edition was in Porto, leveraging on DjangoCon Europe 2022, this year it was in Coimbra, with participants from over 25 countries, and we are already preparing the next edition.

David is enthusiastic, committed and pragmatic. Throughout his personal and professional journey, he has always had a positive impact in every process he puts his mind on, influencing, building and empowering the people around him. He hopes to put his experience to good use in Django Software Foundation.


Jacob Kaplan-Moss Oregon

I was one of the original maintainers of Django, and was the original founder and first President of the DSF. I re-joined the DSF board and have served for the last year. Outside of Django, I'm a security consultant at Latacora, and have previously ran engineering and security teams at 18F and Heroku.

When I ran for the board last year, I wrote:

> I'd be coming back to the DSF with a bunch of experience in executive leadership and more experience working with nonprofits. I think I can apply those skills, along with my general knowledge of the Django community, to push things forward. What that means, specifically, isn't entirely clear yet. I'd plan to spend the first months of my board term asking a bunch of questions and listening.

I did that asking-questions-and-listening, and what needs doing at the DSF became clear. I'd most succinctly articulate it as: "new blood".

The Django community is super-vibrant and new people are joining the community all the time, but it's very hard for people to "level up" and move to any sort of leadership position at the DSF or among the core team. We just don't have very many opportunities for people to have an impact, and we don't have good "onramps" to that work.

So, this term, I (with the rest of the board) started building some of these opportunities onramps! The recently-announced working group and membership changes are the start of this, and if re-elected I'd want to continue working in this direction. It's now easier for people to join the DSF, and easier for them to spin up working groups to do impactful work. But now we need to start defining these groups, funding them, and continuing this growth.


Jay Miller United States

The Django community often serves as a great example for many aspects of the broader Python community. Our community shines when many of us get involved. To make this happen, we need to encourage greater community involvement.

My goals for the next two years, if elected, are to increase the amount of information we share with the community while reducing the time it takes to disseminate that information to the community.

I intend to utilize the existing channels in the Django and the larger Python community. We will also establish new official communication channels for the foundation. These channels will be managed by a Communications Working Group.

The second effort is to extend our reach to a global and diverse audience. We understand that our impact can extend far beyond our current scope by expanding working groups. Therefore, I would work to create and support working groups that currently lack direct representation in the DSF. I would also advocate for decisions that directly impact these areas to be developed and executed by those individual groups with DSF support.

I hope that you will support me in this vision, which aims to increase the visibility and support of the DSF to the farthest reaches of the community.


Mahmoud Nassee Cairo/Egypt

I really like helping people and also helping this awesome community to grow. I don't have much to say 🙂.. But I really like volunteering work it helps me to make something that I could be proud of and also make some new friends!


Ngazetungue Muheue Namibia

I'm Ngazetungue Muheue, a dedicated software developer, community advocate, and a member of the Django Software Foundation (DSF). I'm also the founder of the Python and Django Community in Namibia. Despite facing unique challenges as a member of underprivileged communities and living with a disability, I've played a significant role in expanding Django by establishing and contributing to various Django and Python communities in Africa and Namibia.

Recognizing the importance of open-source communities and user-friendly technology, I've worked closely with students and underprivileged individuals to bridge the tech gap by involving them in Django user groups, teaching Django, and fostering their participation in the global tech community. As a visionary leader, I've cultivated a culture of collaboration, inclusivity, and continuous learning within the African tech ecosystem. My contributions include organizing the inaugural DjangoCon Africa in 2023 and actively participating in organizing and volunteering at DjangoCon Europe in 2023 and 2022, advancing the growth of the Django ecosystem. I've also spoken at various PyCon events worldwide, showcasing my commitment to fostering the global Django and Python community.

As a board member of the Django Software Foundation, my primary goal is to expand Django communities worldwide, connect underprivileged community members with the DSF, and enhance the inclusivity of the Django community. This involves translating Django documentation for non-English speakers, increasing project grants, integrating people with disabilities into the community, and creating internship opportunities for a more diverse and empowered Django community.

Joining the DSF board will enable me to inspire and support nations in engaging young and underprivileged individuals in tech-related activities while safeguarding the interests and mission of our community and the DSF. More links: https://twitter.com/muheuenga https://2023.djangocon.africa/team https://twitter.com/djangonamibia https://na.pycon.org/ https://pynam.org/django/


Paolo Melchiorre Pescara, Italy

Ciao, I'm Paolo and I live in Italy.

I've been a contributor to the Django project for years, and a member of the DSF. I attended my first DjangoCon Europe in 2017 and have since presented many Django talks at conferences around the world. I've participated as a coach in DjangoGirls workshops several times, and I organized one in my hometown. I've always been a Python developer, I helped the PyCon Italia organization for a few years and I recently founded the Python Pescara meetup.

As a member of the DSF board of directors, I would like to bring a different point of view to the foundation, as a southern European citizen, inhabitant of the Mediterranean area, non-native English speaker, and a small company employee.

Some initiatives I would like to carry forward are:

  • organize active user sprints to focus on specific Django features
  • continue the work of renovating the Django project website
  • create synergies with the Python community and its web sub-communities
  • simplify Django documentation and help its translations
  • support creators of Django content (e.g. books, articles, podcasts, videos, ...)


  • Peter Baumgartner Colorado, USA

    I'm a current DSF board member and acting Treasurer.

    I've been a part of the Django community for over 15 years. I'm an open-source contributor, a regular speaker at DjangoCon US, and the co-author of High Performance Django. In 2007, I founded Lincoln Loop, a web agency that leverages Django extensively in its work. Lincoln Loop has financially sponsored the DSF and DjangoCon for many years, and I'm looking for other ways to give back to a community that has given us so much.

    At Lincoln Loop, I have to wear many hats and deeply understand the financial ramifications of our decisions as a company. I believe the experience of running a business will be directly applicable to a position on the DSF board, and I look forward to applying that experience if elected.


    Sarah Abderemane Paris, France

    I'm an active DSF member and I've been contributing to this amazing community via multiple ways:

  • Django contributor and Accessibility Team Member
  • Maintainer of djangoproject.com
  • Organizer of Djangonaut Space
  • Organizer of Django Paris Meetup
  • Organizer of DjangoCon Europe 2023

    I have seen many aspects of the community through all those experiences. As a relatively new member, I can bring a fresh perspective to the community and help foster a renewed sense of togetherness. I have a strong connection with Djangonaut Space mentoring program and the community. I'm well positioned to serve as an intermediary, facilitating communication regarding initiatives and ideas between the board and the community.

    I would like to increase fundraising by improving communication and making improvements to make each sponsor special by highlighting sponsors not only on the website but also on social networks. Relying on my experiences with various Django projects, I will push forward ideas to further develop our community, specifically helping existing and new contributors.

    With the community's support, I will set up a working group for mentorship and push accessibility in the framework. I am passionate about these topics as they show that Django is a framework for everyone by everyone.

    I see myself as a representative of Django's diversity and would like to emphasize and expand the richness of it even more. Being part of the board would inspire people to get involved and be part of the community. They could add their stone to the building of this wonderful community.


  • Thibaud Colas Europe

    To me, Django feels like it's in maintenance mode, a decade behind in areas like front-end development and serverless. To stay relevant compared to projects with tens of millions in venture capital, we need a more vibrant, more diverse community. We can build one together by making the right programs happen, like Djangonaut Space and Outreachy.

    The DSF also needs to evolve with the times. In the age of ChatGPT, copyright and trademarks are very dated concerns. We need a foundation that can help its community navigate modern societal challenges: social equity issues affecting our users; accessibility issues plaguing the Django web; climate change and Django's carbon footprint.

    I can help. Let's grow Django's contributors 10x, and have the Django universe lead by example in community-driven open source.


    Tom Carrick Amsterdam, Netherlands

    I've been using Django since 2008. A lot has changed since then, but one constant has been my wish to see Django continuously improve.

    I'm active in the community in many ways. I've been a regular code contributor since 2016. I founded the accessibility team, and also started the official Discord server. So I've dedicated quite some time to Django already, but I have room for more, with even more impact.

    I would like to help grow the next generation of Django contributors, from more diverse backgrounds. From running DjangoCon sprint tables over the years, and getting involved with Djangonaut Space, it's clear to me that the new contributor experience has substantial room for improvement.

    I also want to expand Django's fundraising efforts. It's becoming difficult to add important new features. We need more funding to hire more Fellows, and expand their remit to work on bigger features.

    The new working groups are a much needed initiative, and I'd love to help develop all these ideas to their fullest potential.


    Velda Kiara Nairobi, Kenya

    As a passionate software developer and technical writer deeply rooted in the open-source community, I am honored to be running for the DSF board. My experience in contributing to open-source projects, coupled with my leadership background in the Open Source Community Africa Nairobi, has ignited my desire to enhance the participation and contributions of communities from diverse backgrounds. My involvement in open-source initiatives has made me appreciate the power of collaboration and the impact of collective efforts. I have witnessed firsthand how open-source communities foster innovation and inclusivity, enabling individuals from all over the world to share their knowledge and expertise.

    Driven by my belief of open source impact, I aspire to elevate the DSF board's decision-making process by incorporating the unique perspectives and insights of communities from diverse backgrounds. My experience working with developer communities has equipped me with the skills and empathy necessary to understand and address the specific needs of these underrepresented groups. As a leader, I prioritize decision-making that aligns with the needs and aspirations of the community. I believe in fostering an environment where everyone feels empowered to participate, contribute, and lead. My commitment to inclusivity extends beyond the color of one's skin; I envision a DSF community that embraces and celebrates the diversity of thought, experience, and background.

    My passion for Django and my role as an advocate for the framework extend beyond personal preference. I recognize the immense value of Django to the developer community and am eager to contribute further through the DSF board. I believe that my involvement will allow me to add value to the Django community, supporting its growth and ensuring that it remains a thriving hub for developers worldwide. My journey in the open-source community began with a fascination for the framework. However, over time, I have come to realize that the true beauty of open-source lies in the community that surrounds it. I am committed to giving back to this community, not just as a developer or technical writer, but also as a leader and advocate for diversity and inclusion.

    I humbly ask for your vote to join the DSF board and contribute my skills, experience, and passion to the continued growth and success of the Django community. Together, we can create a more inclusive and vibrant open-source ecosystem that empowers individuals from all backgrounds to innovate, collaborate, and make a lasting impact on the world.


    Django-related Deals for Black Friday and Cyber Monday 2023
    2 weeks, 3 days ago

    Here are some Django-related deals for this year’s Black Friday (24th Nov) and Cyber Monday (27th Nov), including my own.

    I’ll keep updating this post as I learn about more deals. If you are also a creator, email me with details of your offer and I’ll add it here.

    My books

    My three books have a 50% discount, for both individual and team licenses, until the end of Cyber Monday (27th Nov). This deal stacks with the purchasing power parity discount for those in lower-income countries.

    Buy now:

    Aidas Bendoraitis’ GDPR Cookie Consent Package

    Aidas Bendoraitis of djangotricks.com created this paid third-party app for Django. The package takes the pain out of setting up and customizing legally mandated GDPR Cookie Consent screens. Compared to commercial “one size fits all” solutions, it’s much simpler to use this third-party app to host and tweak your project’s cookie consent screen.

    Use the discount code BLACKFRIDAY2023 for 20% off, from €150 to €120, until the end of November.

    Buy it on Gumroad

    Michael Yin’s Hotwire and Django book

    Michael Yin has written a book on using Hotwire with Django. This is the “HTML-over-the-wire” suite of tools used heavily in Ruby on Rails.

    Use this link for 20% off, from $39.99 to $27.80.

    SaaS Pegasus

    Corey Zue’s SaaS Pegasus, is a configurable Django project template with many preset niceties, including teams, Stripe subscriptions, a JavaScript pipeline, and multiple CSS themes. It can massively accelerate setting up a SaaS in Django.

    The “unlimited lifetime license” is discounted 50%, from $999 to $499.50. This deal is available through 29 November.

    Buy it on saaspegasus.com

    Will Vincent’s Three Book Bundle

    Will Vincent is the author of three fantastic Django books:

    • Django for Beginners
    • Django for APIs
    • Django for Professionals

    He’s offering a 50% discount on the three-book bundle, from $97 to $48.50.

    Buy it on Gumroad

    Bonus: Django Itself

    Okay, there’s no discount here, but there is a good deal! You can fund the framework that you love to ensure it continues to grow.

    If you can spend some money on Django-related products this Black Friday, please consider sponsoring Django itself. You can support it by donating to the charity that runs the framework, the Django Software Foundation.

    Your money will go towards:

    • Paying the Django Fellows, Mariusz and Natalia, who merge respond to tickets, merge code, and make releases.
    • Helping organize DjangoCons in Africa, America, and Europe, and other events.
    • Hosting costs of the documentation, source code, ticketing system, and CI system.
    • A long tail of activities for keeping the framework alive and thriving.

    You can sponsor Django on:

    • GitHub Sponsors - adds to your GitHub bill.
    • Threadless - grab some custom-printed merchandise.
    • The Fundraising Page - bills from a credit card with Stripe. This page also links to other schemes: Amazon Smile and Benevity Workplace Giving.

    If you’re working with Django professionally, please consider sponsoring a few dollars a month. But better yet, get your organization to sponsor to pay back for all the advantages that Django offers you.

    At the time of writing, Django is 58% towards its 2022 funding goal:

    Let’s fill up that heart!

    Fin

    Let’s support Django creators, and Django itself!

    —Adam

    Django News - 205 Reset Content - Nov 10th 2023
    2 weeks, 4 days ago

    News

    Python Developers Survey 2023

    The annual Python developers survey is out. Please take a moment to share your Python practices as the results do have a big impact on the organizations and maintainers in our community.

    alchemer.com

    Takahē: Life-Critical Side Projects

    Andrew Godwin, the developer of Takahē, is looking for new maintainers who want to help out in exchange for mentorship.

    aeracode.org

    Updates to Django

    Last week we had 14 pull requests merged into Django by 9 different contributors - including 2 first time contributors! Congratulations to chenow and Patrick Rauscher for having their first commit merged into Django - welcome on board!

    Some interesting things from last week...

    • Django 5.1 increased support for window frames. Specifically, RowRange and ValueRange now accept an exclusion argument.
    • We had some security releases issued: 4.2.7, 4.1.13, and 3.2.23
    • There was an interesting forum discussion around updating the DEP process.

    Do you speak Português or हिंदी? We will soon have a translation string freeze for the 5.0 release, so this is a good time to join a translation team! You can see the languages Django supports on transifex as well as the ones missing translations. Perhaps you can help translate Django and make it more accessible to our global community!

    Django Newsletter

    Sponsored Link

    Wagtail CMS Developer Training - Introductory Offer

    Build your knowledge and skills in essential Django and Wagtail CMS practices. A two-part training programme, led by Senior Wagtail Developers that extends way beyond typical tutorials and documentation. Only 10 places available per course. Apply here: https://bit.ly/wagtail-developer-training-course

    bit.ly

    Articles

    Database generated columns ⁽¹⁾: Django & SQLite

    An introduction to database-generated columns using SQLite and the new GeneratedField added in Django 5.0.

    paulox.net

    Using SQLite for a Django application on fly.io

    How to attach volumes, run migrations, and get SQLite working for Django applications on Fly.io.

    programmingmylife.com

    Debugging CSRF Failed / 403 Forbidden errors in Django

    A guided deep dive into Django's source code to understand why your application is failing CSRF validation.

    better-simple.com

    How to Kill the Django Development Server Without Ctrl+C

    Sometimes, you lose track of your terminal window but still need to stop a Django development server. Here's how!

    startcodingnow.com

    GitHub Actions: Faster Python runs with cached virtual environments

    A pattern to speed up the GitHub Actions workflow of projects using Python, Pip, and pip-tools.

    adamj.eu

    Events

    PyLadiesCon 2023 schedule is up

    The PyLadiesCon 2023 Online conference schedule is live and spans ~24 hours, and three days, of back-to-back talks.

    pyladies.com

    Django Boston meetup reboot: Mixing reliability with Celery and API Client Generation

    Django Boston is back on November 14th at 6pm EST with two talks on Celery and API client generation.

    meetup.com

    PyLadies Paris Python Talks

    Three talks on November 16th on topics including no downtime migrations in Django.

    meetup.com

    Recap of DjangoCon US 2023 by Katherine Michel

    Whether you attended DjangoCon US in person, virtually, or wished you could, this is a fantastic deep writeup of the event from one of its organizers.

    github.com

    Sponsored Ad

    Sick of performance issues? Enter Scout's APM tool for Python apps. Easily pinpoint and fix slowdowns with intelligent tracing logic. Optimize performance hassle-free, delighting your users. Try us out for free!

    ter.li

    Videos

    Lightning talks - Django Day CPH 2023

    30 minutes of Lightning talks from Django Day in Copenhagen recently.

    youtu.be

    Django Day CPH 2023: Pessimism, optimism, realism and Django database concurrency by Aivars Kalvans

    A look at concurrency in the database through programming language concepts and try to understand what happens behind the scenes of pessimistic and optimistic locking.

    youtu.be

    Django Day CPH 2023: A minimal Django testing styleguide by Joseph Victor Zammit

    How and why to have a style guide for tests in your Django application.

    youtu.be

    Podcasts

    Django Chat #150: Contributing to Django with Sarah Boyce

    Sarah is a British developer based in Germany who is a member of Django’s Review and Triage Team. She is also a co-organizer of Djangonaut Space, a new mentorship program to onboard and develop Django contributors.

    Sarah also contributes to our Django News "Updates to the Django" section.

    djangochat.com

    PyBites Podcast: Maximizing Your Developer Experience (DX) with Adam Johnson

    A discussion of Git Mastery, Python tooling, the future of Django + Htmx / front-end development, and more.

    pybit.es

    Django News Jobs

    We have two new jobs this week on Django News Jobs and several positions that are still open.

    Senior Python Engineer at Loginsoft 🆕

    Software Engineer - Ubuntu Systems Management at Canonical 🆕

    Front End Web UI Django Developer (NC) at Ansys

    Junior Web Developer at The Westervelt Company

    Django Girls Communications Officer at Django Girls

    Django Girls Awesomeness Ambassador at Django Girls

    Django Newsletter

    Projects

    egoist/tailwindcss-icons

    Use any icon (100,000+) from Iconify, for TailwindCSS.

    github.com

    hizbul25/django-send-sms

    Send SMS from Django application using any SMS service provider just writing a single line of code.

    github.com


    This RSS feed is published on https://django-news.com/. You can also subscribe via email.

    aiGrunn: be a better developer with AI - Henry Bol
    2 weeks, 5 days ago

    (One of my summaries of the 2023 Dutch aiGrunn AI conference in Groningen, NL).

    "Everybody" uses stackoverflow. Now lots of people use chatgpt (or chatgpt plus). Stackoverflow traffic has dropped by 50% in the last 1.5 year. So chatgpt can be your coding buddy.

    He really likes it for quickly getting something working (MVP). Like writing something that talks to a magento API (a webshop system). It would take him ages to figure it all out. Or he could ask chatgpt.

    He also thinks you don't need docstrings anymore: you can just ask chatgpt to explain a snippet of code for you. (Something I myself don't agree with, btw).

    (He demoed some chatgpt code generation of a sample website). What he learned:

    • Good briefing and interaction is key. First tell it what you want before you start to code.
    • Chatgpt sometimes loses track if the interaction goes on for too long.
    • Read what it gives you, otherwise you won't know what it build for you.
    • Watch out for the "cut-off time" of the chatgpt training set: perhaps newer versions of libraries don't work anymore with the generated code.

    Some dangers:

    • You get lazy.
    • You can get frustrated if you don't understand what has been generated for you.
    lukas-blecher/LaTeX-OCR
    1 week, 5 days ago

    pix2tex: Using a ViT to convert images of equations into LaTeX code.


    pix2tex - LaTeX OCR

    The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.

    Using the model

    To run the model you need Python 3.7+

    If you don't have PyTorch installed. Follow their instructions here.

    Install the package pix2tex:

    pip install "pix2tex[gui]"
    

    Model checkpoints will be downloaded automatically.

    There are three ways to get a prediction from an image.

    1. You can use the command line tool by calling pix2tex. Here you can parse already existing images from the disk and images in your clipboard.

    2. Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with latexocr. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.

      Under linux, it is possible to use the GUI with gnome-screenshot (which comes with multiple monitor support) if gnome-screenshot was installed beforehand. For Wayland, grim and slurp will be used when they are both available. Note that gnome-screenshot is not compatible with wlroots-based Wayland compositors. Since gnome-screenshot will be preferred when available, you may have to set the environment variable SCREENSHOT_TOOL to grim in this case (other available values are gnome-screenshot and pil).

      If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature parameter you can control this behavior (low temperature will produce the same result).

    3. You can use an API. This has additional dependencies. Install via pip install -U "pix2tex[api]" and run

      python -m pix2tex.api.run
      

      to start a Streamlit demo that connects to the API at port 8502. There is also a docker image available for the API: https://hub.docker.com/r/lukasblecher/pix2tex

      docker pull lukasblecher/pix2tex:api
      docker run --rm -p 8502:8502 lukasblecher/pix2tex:api
      

      To also run the streamlit demo run

      docker run --rm -it -p 8501:8501 --entrypoint python lukasblecher/pix2tex:api pix2tex/api/run.py
      

      and navigate to http://localhost:8501/

    4. Use from within Python

      from PIL import Image
      from pix2tex.cli import LatexOCR
      
      img = Image.open('path/to/image.png')
      model = LatexOCR()
      print(model(img))
      

    The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.

    Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

    Want to use the package?

    I'm trying to compile a documentation right now.

    Visit here: https://pix2tex.readthedocs.io/

    Training the model

    Install a couple of dependencies pip install "pix2tex[train]".

    1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run
    python -m pix2tex.dataset.dataset --equations path_to_textfile --images path_to_images --out dataset.pkl
    

    To use your own tokenizer pass it via --tokenizer (See below).

    You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.

    1. Edit the data (and valdata) entry in the config file to the newly generated .pkl file. Change other hyperparameters if you want to. See pix2tex/model/settings/config.yaml for a template.
    2. Now for the actual training run
    python -m pix2tex.train --config path_to_config_file
    

    If you want to use your own data you might be interested in creating your own tokenizer with

    python -m pix2tex.dataset.dataset --equations path_to_textfile --vocab-size 8000 --out tokenizer.json
    

    Don't forget to update the path to the tokenizer in the config file and set num_tokens to your vocabulary size.

    Model

    The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.

    Performance

    BLEU score normed edit distance token accuracy
    0.88 0.10 0.60

    Data

    We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here

    Dataset Requirements

    In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:

    Fonts

    Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

    TODO

    • add more evaluation metrics
    • create a GUI
    • add beam search
    • support handwritten formulae (kinda done, see training colab notebook)
    • reduce model size (distillation)
    • find optimal hyperparameters
    • tweak model structure
    • fix data scraping and scrape more data
    • trace the model (#2)

    Contribution

    Contributions of any kind are welcome.

    Acknowledgment

    Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool

    References

    [1] An Image is Worth 16x16 Words

    [2] Attention Is All You Need

    [3] Image-to-Markup Generation with Coarse-to-Fine Attention

    openai/whisper
    1 week, 6 days ago

    Robust Speech Recognition via Large-Scale Weak Supervision


    Whisper

    [Blog] [Paper] [Model card] [Colab example]

    Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

    Approach

    A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

    Setup

    We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

    pip install -U openai-whisper
    

    Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

    pip install git+https://github.com/openai/whisper.git 
    

    To update the package to the latest version of this repository, please run:

    pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
    

    It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

    # on Ubuntu or Debian
    sudo apt update && sudo apt install ffmpeg
    
    # on Arch Linux
    sudo pacman -S ffmpeg
    
    # on MacOS using Homebrew (https://brew.sh/)
    brew install ffmpeg
    
    # on Windows using Chocolatey (https://chocolatey.org/)
    choco install ffmpeg
    
    # on Windows using Scoop (https://scoop.sh/)
    scoop install ffmpeg
    

    You may need rust installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

    pip install setuptools-rust
    

    Available models and languages

    There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

    Size Parameters English-only model Multilingual model Required VRAM Relative speed
    tiny 39 M tiny.en tiny ~1 GB ~32x
    base 74 M base.en base ~1 GB ~16x
    small 244 M small.en small ~2 GB ~6x
    medium 769 M medium.en medium ~5 GB ~2x
    large 1550 M N/A large ~10 GB 1x

    The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

    Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

    Command-line usage

    The following command will transcribe speech in audio files, using the medium model:

    whisper audio.flac audio.mp3 audio.wav --model medium
    

    The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

    whisper japanese.wav --language Japanese
    

    Adding --task translate will translate the speech into English:

    whisper japanese.wav --language Japanese --task translate
    

    Run the following to view all available options:

    whisper --help
    

    See tokenizer.py for the list of all available languages.

    Python usage

    Transcription can also be performed within Python:

    import whisper
    
    model = whisper.load_model("base")
    result = model.transcribe("audio.mp3")
    print(result["text"])
    

    Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

    Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.

    import whisper
    
    model = whisper.load_model("base")
    
    # load audio and pad/trim it to fit 30 seconds
    audio = whisper.load_audio("audio.mp3")
    audio = whisper.pad_or_trim(audio)
    
    # make log-Mel spectrogram and move to the same device as the model
    mel = whisper.log_mel_spectrogram(audio).to(model.device)
    
    # detect the spoken language
    _, probs = model.detect_language(mel)
    print(f"Detected language: {max(probs, key=probs.get)}")
    
    # decode the audio
    options = whisper.DecodingOptions()
    result = whisper.decode(model, mel, options)
    
    # print the recognized text
    print(result.text)
    

    More examples

    Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.

    License

    Whisper's code and model weights are released under the MIT License. See LICENSE for further details.

    donnemartin/system-design-primer
    1 week, 6 days ago

    Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.


    English日本語简体中文繁體中文 | العَرَبِيَّة‎বাংলাPortuguês do BrasilDeutschελληνικάעבריתItaliano한국어فارسیPolskiрусский языкEspañolภาษาไทยTürkçetiếng ViệtFrançais | Add Translation

    Help translate this guide!

    The System Design Primer


    Motivation

    Learn how to design large-scale systems.

    Prep for the system design interview.

    Learn how to design large-scale systems

    Learning how to design scalable systems will help you become a better engineer.

    System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

    This repo is an organized collection of resources to help you learn how to build systems at scale.

    Learn from the open source community

    This is a continually updated, open source project.

    Contributions are welcome!

    Prep for the system design interview

    In addition to coding interviews, system design is a required component of the technical interview process at many tech companies.

    Practice common system design interview questions and compare your results with sample solutions: discussions, code, and diagrams.

    Additional topics for interview prep:

    Anki flashcards


    The provided Anki flashcard decks use spaced repetition to help you retain key system design concepts.

    Great for use while on-the-go.

    Coding Resource: Interactive Coding Challenges

    Looking for resources to help you prep for the Coding Interview?


    Check out the sister repo Interactive Coding Challenges, which contains an additional Anki deck:

    Contributing

    Learn from the community.

    Feel free to submit pull requests to help:

    • Fix errors
    • Improve sections
    • Add new sections
    • Translate

    Content that needs some polishing is placed under development.

    Review the Contributing Guidelines.

    Index of system design topics

    Summaries of various system design topics, including pros and cons. Everything is a trade-off.

    Each section contains links to more in-depth resources.


    Study guide

    Suggested topics to review based on your interview timeline (short, medium, long).

    Q: For interviews, do I need to know everything here?

    A: No, you don't need to know everything here to prepare for the interview.

    What you are asked in an interview depends on variables such as:

    • How much experience you have
    • What your technical background is
    • What positions you are interviewing for
    • Which companies you are interviewing with
    • Luck

    More experienced candidates are generally expected to know more about system design. Architects or team leads might be expected to know more than individual contributors. Top tech companies are likely to have one or more design interview rounds.

    Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the following guide based on your timeline, experience, what positions you are interviewing for, and which companies you are interviewing with.

    • Short timeline - Aim for breadth with system design topics. Practice by solving some interview questions.
    • Medium timeline - Aim for breadth and some depth with system design topics. Practice by solving many interview questions.
    • Long timeline - Aim for breadth and more depth with system design topics. Practice by solving most interview questions.
    Short Medium Long
    Read through the System design topics to get a broad understanding of how systems work 👍 👍 👍
    Read through a few articles in the Company engineering blogs for the companies you are interviewing with 👍 👍 👍
    Read through a few Real world architectures 👍 👍 👍
    Review How to approach a system design interview question 👍 👍 👍
    Work through System design interview questions with solutions Some Many Most
    Work through Object-oriented design interview questions with solutions Some Many Most
    Review Additional system design interview questions Some Many Most

    How to approach a system design interview question

    How to tackle a system design interview question.

    The system design interview is an open-ended conversation. You are expected to lead it.

    You can use the following steps to guide the discussion. To help solidify this process, work through the System design interview questions with solutions section using the following steps.

    Step 1: Outline use cases, constraints, and assumptions

    Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.

    • Who is going to use it?
    • How are they going to use it?
    • How many users are there?
    • What does the system do?
    • What are the inputs and outputs of the system?
    • How much data do we expect to handle?
    • How many requests per second do we expect?
    • What is the expected read to write ratio?

    Step 2: Create a high level design

    Outline a high level design with all important components.

    • Sketch the main components and connections
    • Justify your ideas

    Step 3: Design core components

    Dive into details for each core component. For example, if you were asked to design a url shortening service, discuss:

    • Generating and storing a hash of the full url
      • MD5 and Base62
      • Hash collisions
      • SQL or NoSQL
      • Database schema
    • Translating a hashed url to the full url
      • Database lookup
    • API and object-oriented design

    Step 4: Scale the design

    Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability issues?

    • Load balancer
    • Horizontal scaling
    • Caching
    • Database sharding

    Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using principles of scalable system design.

    Back-of-the-envelope calculations

    You might be asked to do some estimates by hand. Refer to the Appendix for the following resources:

    Source(s) and further reading

    Check out the following links to get a better idea of what to expect:

    System design interview questions with solutions

    Common system design interview questions with sample discussions, code, and diagrams.

    Solutions linked to content in the solutions/ folder.

    Question
    Design Pastebin.com (or Bit.ly) Solution
    Design the Twitter timeline and search (or Facebook feed and search) Solution
    Design a web crawler Solution
    Design Mint.com Solution
    Design the data structures for a social network Solution
    Design a key-value store for a search engine Solution
    Design Amazon's sales ranking by category feature Solution
    Design a system that scales to millions of users on AWS Solution
    Add a system design question Contribute

    Design Pastebin.com (or Bit.ly)

    View exercise and solution

    Design the Twitter timeline and search (or Facebook feed and search)

    View exercise and solution

    Design a web crawler

    View exercise and solution

    Design Mint.com

    View exercise and solution

    Design the data structures for a social network

    View exercise and solution

    Design a key-value store for a search engine

    View exercise and solution

    Design Amazon's sales ranking by category feature

    View exercise and solution

    Design a system that scales to millions of users on AWS

    View exercise and solution

    Object-oriented design interview questions with solutions

    Common object-oriented design interview questions with sample discussions, code, and diagrams.

    Solutions linked to content in the solutions/ folder.

    Note: This section is under development

    Question
    Design a hash map Solution
    Design a least recently used cache Solution
    Design a call center Solution
    Design a deck of cards Solution
    Design a parking lot Solution
    Design a chat server Solution
    Design a circular array Contribute
    Add an object-oriented design question Contribute

    System design topics: start here

    New to system design?

    First, you'll need a basic understanding of common principles, learning about what they are, how they are used, and their pros and cons.

    Step 1: Review the scalability video lecture

    Scalability Lecture at Harvard

    • Topics covered:
      • Vertical scaling
      • Horizontal scaling
      • Caching
      • Load balancing
      • Database replication
      • Database partitioning

    Step 2: Review the scalability article

    Scalability

    Next steps

    Next, we'll look at high-level trade-offs:

    • Performance vs scalability
    • Latency vs throughput
    • Availability vs consistency

    Keep in mind that everything is a trade-off.

    Then we'll dive into more specific topics such as DNS, CDNs, and load balancers.

    Performance vs scalability

    A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.1

    Another way to look at performance vs scalability:

    • If you have a performance problem, your system is slow for a single user.
    • If you have a scalability problem, your system is fast for a single user but slow under heavy load.

    Source(s) and further reading

    Latency vs throughput

    Latency is the time to perform some action or to produce some result.

    Throughput is the number of such actions or results per unit of time.

    Generally, you should aim for maximal throughput with acceptable latency.

    Source(s) and further reading

    Availability vs consistency

    CAP theorem


    Source: CAP theorem revisited

    In a distributed computer system, you can only support two of the following guarantees:

    • Consistency - Every read receives the most recent write or an error
    • Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
    • Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures

    Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.

    CP - consistency and partition tolerance

    Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.

    AP - availability and partition tolerance

    Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.

    AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.

    Source(s) and further reading

    Consistency patterns

    With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most recent write or an error.

    Weak consistency

    After a write, reads may or may not see it. A best effort approach is taken.

    This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.

    Eventual consistency

    After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.

    This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.

    Strong consistency

    After a write, reads will see it. Data is replicated synchronously.

    This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.

    Source(s) and further reading

    Availability patterns

    There are two complementary patterns to support high availability: fail-over and replication.

    Fail-over

    Active-passive

    With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.

    The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.

    Active-passive failover can also be referred to as master-slave failover.

    Active-active

    In active-active, both servers are managing traffic, spreading the load between them.

    If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.

    Active-active failover can also be referred to as master-master failover.

    Disadvantage(s): failover

    • Fail-over adds more hardware and additional complexity.
    • There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.

    Replication

    Master-slave and master-master

    This topic is further discussed in the Database section:

    Availability in numbers

    Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.

    99.9% availability - three 9s

    Duration Acceptable downtime
    Downtime per year 8h 45min 57s
    Downtime per month 43m 49.7s
    Downtime per week 10m 4.8s
    Downtime per day 1m 26.4s

    99.99% availability - four 9s

    Duration Acceptable downtime
    Downtime per year 52min 35.7s
    Downtime per month 4m 23s
    Downtime per week 1m 5s
    Downtime per day 8.6s

    Availability in parallel vs in sequence

    If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.

    In sequence

    Overall availability decreases when two components with availability < 100% are in sequence:

    Availability (Total) = Availability (Foo) * Availability (Bar)
    

    If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%.

    In parallel

    Overall availability increases when two components with availability < 100% are in parallel:

    Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
    

    If both Foo and Bar each had 99.9% availability, their total availability in parallel would be 99.9999%.

    Domain name system


    Source: DNS security presentation

    A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address.

    DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).

    • NS record (name server) - Specifies the DNS servers for your domain/subdomain.
    • MX record (mail exchange) - Specifies the mail servers for accepting messages.
    • A record (address) - Points a name to an IP address.
    • CNAME (canonical) - Points a name to another name or CNAME (example.com to www.example.com) or to an A record.

    Services such as CloudFlare and Route 53 provide managed DNS services. Some DNS services can route traffic through various methods:

    Disadvantage(s): DNS

    • Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
    • DNS server management could be complex and is generally managed by governments, ISPs, and large companies.
    • DNS services have recently come under DDoS attack, preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).

    Source(s) and further reading

    Content delivery network


    Source: Why use a CDN

    A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.

    Serving content from CDNs can significantly improve performance in two ways:

    • Users receive content from data centers close to them
    • Your servers do not have to serve requests that the CDN fulfills

    Push CDNs

    Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.

    Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.

    Pull CDNs

    Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.

    A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.

    Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.

    Disadvantage(s): CDN

    • CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN.
    • Content might be stale if it is updated before the TTL expires it.
    • CDNs require changing URLs for static content to point to the CDN.

    Source(s) and further reading

    Load balancer


    Source: Scalable system design patterns

    Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:

    • Preventing requests from going to unhealthy servers
    • Preventing overloading resources
    • Helping to eliminate a single point of failure

    Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.

    Additional benefits include:

    • SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
    • Session persistence - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions

    To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.

    Load balancers can route traffic based on various metrics, including:

    Layer 4 load balancing

    Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).

    Layer 7 load balancing

    Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.

    At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.

    Horizontal scaling

    Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.

    Disadvantage(s): horizontal scaling

    • Scaling horizontally introduces complexity and involves cloning servers
      • Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
      • Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
    • Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out

    Disadvantage(s): load balancer

    • The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
    • Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
    • A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.

    Source(s) and further reading

    Reverse proxy (web server)


    Source: Wikipedia

    A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.

    Additional benefits include:

    • Increased security - Hide information about backend servers, blacklist IPs, limit number of connections per client
    • Increased scalability and flexibility - Clients only see the reverse proxy's IP, allowing you to scale servers or change their configuration
    • SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
    • Compression - Compress server responses
    • Caching - Return the response for cached requests
    • Static content - Serve static content directly
      • HTML/CSS/JS
      • Photos
      • Videos
      • Etc

    Load balancer vs reverse proxy

    • Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.
    • Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
    • Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.

    Disadvantage(s): reverse proxy

    • Introducing a reverse proxy results in increased complexity.
    • A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.

    Source(s) and further reading

    Application layer


    Source: Intro to architecting systems for scale

    Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.

    Workers in the application layer also help enable asynchronism.

    Microservices

    Related to this discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1

    Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.

    Service Discovery

    Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.

    Disadvantage(s): application layer

    • Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).
    • Microservices can add complexity in terms of deployments and operations.

    Source(s) and further reading

    Database


    Source: Scaling up to your first 10 million users

    Relational database management system (RDBMS)

    A relational database like SQL is a collection of data items organized in tables.

    ACID is a set of properties of relational database transactions.

    • Atomicity - Each transaction is all or nothing
    • Consistency - Any transaction will bring the database from one valid state to another
    • Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
    • Durability - Once a transaction has been committed, it will remain so

    There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.

    Master-slave replication

    The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.


    Source: Scalability, availability, stability, patterns

    Disadvantage(s): master-slave replication
    • Additional logic is needed to promote a slave to a master.
    • See Disadvantage(s): replication for points related to both master-slave and master-master.

    Master-master replication

    Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.


    Source: Scalability, availability, stability, patterns

    Disadvantage(s): master-master replication
    • You'll need a load balancer or you'll need to make changes to your application logic to determine where to write.
    • Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
    • Conflict resolution comes more into play as more write nodes are added and as latency increases.
    • See Disadvantage(s): replication for points related to both master-slave and master-master.
    Disadvantage(s): replication
    • There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
    • Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can't do as many reads.
    • The more read slaves, the more you have to replicate, which leads to greater replication lag.
    • On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
    • Replication adds more hardware and additional complexity.
    Source(s) and further reading: replication

    Federation


    Source: Scaling up to your first 10 million users

    Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.

    Disadvantage(s): federation
    • Federation is not effective if your schema requires huge functions or tables.
    • You'll need to update your application logic to determine which database to read and write.
    • Joining data from two databases is more complex with a server link.
    • Federation adds more hardware and additional complexity.
    Source(s) and further reading: federation

    Sharding


    Source: Scalability, availability, stability, patterns

    Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.

    Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.

    Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.

    Disadvantage(s): sharding
    • You'll need to update your application logic to work with shards, which could result in complex SQL queries.
    • Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
      • Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
    • Joining data from multiple shards is more complex.
    • Sharding adds more hardware and additional complexity.
    Source(s) and further reading: sharding

    Denormalization

    Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.

    Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.

    In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database join can be very expensive, spending a significant amount of time on disk operations.

    Disadvantage(s): denormalization
    • Data is duplicated.
    • Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
    • A denormalized database under heavy write load might perform worse than its normalized counterpart.
    Source(s) and further reading: denormalization

    SQL tuning

    SQL tuning is a broad topic and many books have been written as reference.

    It's important to benchmark and profile to simulate and uncover bottlenecks.

    • Benchmark - Simulate high-load situations with tools such as ab.
    • Profile - Enable tools such as the slow query log to help track performance issues.

    Benchmarking and profiling might point you to the following optimizations.

    Tighten up the schema
    • MySQL dumps to disk in contiguous blocks for fast access.
    • Use CHAR instead of VARCHAR for fixed-length fields.
      • CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
    • Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
    • Use INT for larger numbers up to 2^32 or 4 billion.
    • Use DECIMAL for currency to avoid floating point representation errors.
    • Avoid storing large BLOBS, store the location of where to get the object instead.
    • VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the use of a byte in some RDBMS.
    • Set the NOT NULL constraint where applicable to improve search performance.
    Use good indices
    • Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
    • Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
    • Placing an index can keep the data in memory, requiring more space.
    • Writes could also be slower since the index also needs to be updated.
    • When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
    Avoid expensive joins
    Partition tables
    • Break up a table by putting hot spots in a separate table to help keep it in memory.
    Tune the query cache
    Source(s) and further reading: SQL tuning

    NoSQL

    NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.

    BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.

    • Basically available - the system guarantees availability.
    • Soft state - the state of the system may change over time, even without input.
    • Eventual consistency - the system will become consistent over a period of time, given that the system doesn't receive input during that period.

    In addition to choosing between SQL or NoSQL, it is helpful to understand which type of NoSQL database best fits your use case(s). We'll review key-value stores, document stores, wide column stores, and graph databases in the next section.

    Key-value store

    Abstraction: hash table

    A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.

    Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.

    A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.

    Source(s) and further reading: key-value store

    Document store

    Abstraction: key-value store with documents stored as values

    A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.

    Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.

    Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.

    Document stores provide high flexibility and are often used for working with occasionally changing data.

    Source(s) and further reading: document store

    Wide column store


    Source: SQL & NoSQL, a brief history

    Abstraction: nested map ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>

    A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.

    Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.

    Wide column stores offer high availability and high scalability. They are often used for very large data sets.

    Source(s) and further reading: wide column store

    Graph database


    Source: Graph database

    Abstraction: graph

    In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.

    Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.

    Source(s) and further reading: graph

    Source(s) and further reading: NoSQL

    SQL or NoSQL


    Source: Transitioning from RDBMS to NoSQL

    Reasons for SQL:

    • Structured data
    • Strict schema
    • Relational data
    • Need for complex joins
    • Transactions
    • Clear patterns for scaling
    • More established: developers, community, code, tools, etc
    • Lookups by index are very fast

    Reasons for NoSQL:

    • Semi-structured data
    • Dynamic or flexible schema
    • Non-relational data
    • No need for complex joins
    • Store many TB (or PB) of data
    • Very data intensive workload
    • Very high throughput for IOPS

    Sample data well-suited for NoSQL:

    • Rapid ingest of clickstream and log data
    • Leaderboard or scoring data
    • Temporary data, such as a shopping cart
    • Frequently accessed ('hot') tables
    • Metadata/lookup tables
    Source(s) and further reading: SQL or NoSQL

    Cache


    Source: Scalable system design patterns

    Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.

    Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.

    Client caching

    Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.

    CDN caching

    CDNs are considered a type of cache.

    Web server caching

    Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.

    Database caching

    Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.

    Application caching

    In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.

    Redis has the following additional features:

    • Persistence option
    • Built-in data structures such as sorted sets and lists

    There are multiple levels you can cache that fall into two general categories: database queries and objects:

    • Row level
    • Query-level
    • Fully-formed serializable objects
    • Fully-rendered HTML

    Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.

    Caching at the database query level

    Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:

    • Hard to delete a cached result with complex queries
    • If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell

    Caching at the object level

    See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):

    • Remove the object from cache if its underlying data has changed
    • Allows for asynchronous processing: workers assemble objects by consuming the latest cached object

    Suggestions of what to cache:

    • User sessions
    • Fully rendered web pages
    • Activity streams
    • User graph data

    When to update the cache

    Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.

    Cache-aside


    Source: From cache to in-memory data grid

    The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:

    • Look for entry in cache, resulting in a cache miss
    • Load entry from the database
    • Add entry to cache
    • Return entry
    def get_user(self, user_id):
        user = cache.get("user.{0}", user_id)
        if user is None:
            user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
            if user is not None:
                key = "user.{0}".format(user_id)
                cache.set(key, json.dumps(user))
        return user
    

    Memcached is generally used in this manner.

    Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.

    Disadvantage(s): cache-aside
    • Each cache miss results in three trips, which can cause a noticeable delay.
    • Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
    • When a node fails, it is replaced by a new, empty node, increasing latency.

    Write-through


    Source: Scalability, availability, stability, patterns

    The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:

    • Application adds/updates entry in cache
    • Cache synchronously writes entry to data store
    • Return

    Application code:

    set_user(12345, {"foo":"bar"})
    

    Cache code:

    def set_user(user_id, values):
        user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
        cache.set(user_id, user)
    

    Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.

    Disadvantage(s): write through
    • When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
    • Most data written might never be read, which can be minimized with a TTL.

    Write-behind (write-back)


    Source: Scalability, availability, stability, patterns

    In write-behind, the application does the following:

    • Add/update entry in cache
    • Asynchronously write entry to the data store, improving write performance
    Disadvantage(s): write-behind
    • There could be data loss if the cache goes down prior to its contents hitting the data store.
    • It is more complex to implement write-behind than it is to implement cache-aside or write-through.

    Refresh-ahead


    Source: From cache to in-memory data grid

    You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.

    Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.

    Disadvantage(s): refresh-ahead
    • Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.

    Disadvantage(s): cache

    • Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.
    • Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
    • Need to make application changes such as adding Redis or memcached.

    Source(s) and further reading

    Asynchronism


    Source: Intro to architecting systems for scale

    Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.

    Message queues

    Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:

    • An application publishes a job to the queue, then notifies the user of job status
    • A worker picks up the job from the queue, processes it, then signals the job is complete

    The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.

    Redis is useful as a simple message broker but messages can be lost.

    RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.

    Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.

    Task queues

    Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.

    Celery has support for scheduling and primarily has python support.

    Back pressure

    If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with exponential backoff.

    Disadvantage(s): asynchronism

    • Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.

    Source(s) and further reading

    Communication


    Source: OSI 7 layer model

    Hypertext transfer protocol (HTTP)

    HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.

    A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:

    Verb Description Idempotent* Safe Cacheable
    GET Reads a resource Yes Yes Yes
    POST Creates a resource or trigger a process that handles data No No Yes if response contains freshness info
    PUT Creates or replace a resource Yes No No
    PATCH Partially updates a resource No No Yes if response contains freshness info
    DELETE Deletes a resource Yes No No

    *Can be called many times without different outcomes.

    HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.

    Source(s) and further reading: HTTP

    Transmission control protocol (TCP)


    Source: How to make a multiplayer game

    TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through:

    If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.

    To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.

    TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.

    Use TCP over UDP when:

    • You need all of the data to arrive intact
    • You want to automatically make a best estimate use of the network throughput

    User datagram protocol (UDP)


    Source: How to make a multiplayer game

    UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.

    UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.

    UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.

    Use UDP over TCP when:

    • You need the lowest latency
    • Late data is worse than loss of data
    • You want to implement your own error correction

    Source(s) and further reading: TCP and UDP

    Remote procedure call (RPC)


    Source: Crack the system design interview

    In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include Protobuf, Thrift, and Avro.

    RPC is a request-response protocol:

    • Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.
    • Client stub procedure - Marshals (packs) procedure id and arguments into a request message.
    • Client communication module - OS sends the message from the client to the server.
    • Server communication module - OS passes the incoming packets to the server stub procedure.
    • Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.
    • The server response repeats the steps above in reverse order.

    Sample RPC calls:

    GET /someoperation?data=anId
    
    POST /anotheroperation
    {
      "data":"anId";
      "anotherdata": "another value"
    }
    

    RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.

    Choose a native library (aka SDK) when:

    • You know your target platform.
    • You want to control how your "logic" is accessed.
    • You want to control how error control happens off your library.
    • Performance and end user experience is your primary concern.

    HTTP APIs following REST tend to be used more often for public APIs.

    Disadvantage(s): RPC

    • RPC clients become tightly coupled to the service implementation.
    • A new API must be defined for every new operation or use case.
    • It can be difficult to debug RPC.
    • You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure RPC calls are properly cached on caching servers such as Squid.

    Representational state transfer (REST)

    REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.

    There are four qualities of a RESTful interface:

    • Identify resources (URI in HTTP) - use the same URI regardless of any operation.
    • Change with representations (Verbs in HTTP) - use verbs, headers, and body.
    • Self-descriptive error message (status response in HTTP) - Use status codes, don't reinvent the wheel.
    • HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser.

    Sample REST calls:

    GET /someresources/anId
    
    PUT /someresources/anId
    {"anotherdata": "another value"}
    

    REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.

    Disadvantage(s): REST

    • With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
    • REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
    • Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
    • Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.

    RPC and REST calls comparison

    Operation RPC REST
    Signup POST /signup POST /persons
    Resign POST /resign
    {
    "personid": "1234"
    }
    DELETE /persons/1234
    Read a person GET /readPerson?personid=1234 GET /persons/1234
    Read a person’s items list GET /readUsersItemsList?personid=1234 GET /persons/1234/items
    Add an item to a person’s items POST /addItemToUsersItemsList
    {
    "personid": "1234";
    "itemid": "456"
    }
    POST /persons/1234/items
    {
    "itemid": "456"
    }
    Update an item POST /modifyItem
    {
    "itemid": "456";
    "key": "value"
    }
    PUT /items/456
    {
    "key": "value"
    }
    Delete an item POST /removeItem
    {
    "itemid": "456"
    }
    DELETE /items/456

    Source: Do you really know why you prefer REST over RPC

    Source(s) and further reading: REST and RPC

    Security

    This section could use some updates. Consider contributing!

    Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a position that requires knowledge of security, you probably won't need to know more than the basics:

    • Encrypt in transit and at rest.
    • Sanitize all user inputs or any input parameters exposed to user to prevent XSS and SQL injection.
    • Use parameterized queries to prevent SQL injection.
    • Use the principle of least privilege.

    Source(s) and further reading

    Appendix

    You'll sometimes be asked to do 'back-of-the-envelope' estimates. For example, you might need to determine how long it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers of two table and Latency numbers every programmer should know are handy references.

    Powers of two table

    Power           Exact Value         Approx Value        Bytes
    ---------------------------------------------------------------
    7                             128
    8                             256
    10                           1024   1 thousand           1 KB
    16                         65,536                       64 KB
    20                      1,048,576   1 million            1 MB
    30                  1,073,741,824   1 billion            1 GB
    32                  4,294,967,296                        4 GB
    40              1,099,511,627,776   1 trillion           1 TB
    

    Source(s) and further reading

    Latency numbers every programmer should know

    Latency Comparison Numbers
    --------------------------
    L1 cache reference                           0.5 ns
    Branch mispredict                            5   ns
    L2 cache reference                           7   ns                      14x L1 cache
    Mutex lock/unlock                           25   ns
    Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
    Compress 1K bytes with Zippy            10,000   ns       10 us
    Send 1 KB bytes over 1 Gbps network     10,000   ns       10 us
    Read 4 KB randomly from SSD*           150,000   ns      150 us          ~1GB/sec SSD
    Read 1 MB sequentially from memory     250,000   ns      250 us
    Round trip within same datacenter      500,000   ns      500 us
    Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
    HDD seek                            10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
    Read 1 MB sequentially from 1 Gbps  10,000,000   ns   10,000 us   10 ms  40x memory, 10X SSD
    Read 1 MB sequentially from HDD     30,000,000   ns   30,000 us   30 ms 120x memory, 30X SSD
    Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms
    
    Notes
    -----
    1 ns = 10^-9 seconds
    1 us = 10^-6 seconds = 1,000 ns
    1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
    

    Handy metrics based on numbers above:

    • Read sequentially from HDD at 30 MB/s
    • Read sequentially from 1 Gbps Ethernet at 100 MB/s
    • Read sequentially from SSD at 1 GB/s
    • Read sequentially from main memory at 4 GB/s
    • 6-7 world-wide round trips per second
    • 2,000 round trips per second within a data center

    Latency numbers visualized

    Source(s) and further reading

    Additional system design interview questions

    Common system design interview questions, with links to resources on how to solve each.

    Question Reference(s)
    Design a file sync service like Dropbox youtube.com
    Design a search engine like Google queue.acm.org
    stackexchange.com
    ardendertat.com
    stanford.edu
    Design a scalable web crawler like Google quora.com
    Design Google docs code.google.com
    neil.fraser.name
    Design a key-value store like Redis slideshare.net
    Design a cache system like Memcached slideshare.net
    Design a recommendation system like Amazon's hulu.com
    ijcai13.org
    Design a tinyurl system like Bitly n00tc0d3r.blogspot.com
    Design a chat app like WhatsApp highscalability.com
    Design a picture sharing system like Instagram highscalability.com
    highscalability.com
    Design the Facebook news feed function quora.com
    quora.com
    slideshare.net
    Design the Facebook timeline function facebook.com
    highscalability.com
    Design the Facebook chat function erlang-factory.com
    facebook.com
    Design a graph search function like Facebook's facebook.com
    facebook.com
    facebook.com
    Design a content delivery network like CloudFlare figshare.com
    Design a trending topic system like Twitter's michael-noll.com
    snikolov .wordpress.com
    Design a random ID generation system blog.twitter.com
    github.com
    Return the top k requests during a time interval cs.ucsb.edu
    wpi.edu
    Design a system that serves data from multiple data centers highscalability.com
    Design an online multiplayer card game indieflashblog.com
    buildnewgames.com
    Design a garbage collection system stuffwithstuff.com
    washington.edu
    Design an API rate limiter https://stripe.com/blog/
    Design a Stock Exchange (like NASDAQ or Binance) Jane Street
    Golang Implementation
    Go Implementation
    Add a system design question Contribute

    Real world architectures

    Articles on how real world systems are designed.


    Source: Twitter timelines at scale

    Don't focus on nitty gritty details for the following articles, instead:

    • Identify shared principles, common technologies, and patterns within these articles
    • Study what problems are solved by each component, where it works, where it doesn't
    • Review the lessons learned
    Type System Reference(s)
    Data processing MapReduce - Distributed data processing from Google research.google.com
    Data processing Spark - Distributed data processing from Databricks slideshare.net
    Data processing Storm - Distributed data processing from Twitter slideshare.net
    Data store Bigtable - Distributed column-oriented database from Google harvard.edu
    Data store HBase - Open source implementation of Bigtable slideshare.net
    Data store Cassandra - Distributed column-oriented database from Facebook slideshare.net
    Data store DynamoDB - Document-oriented database from Amazon harvard.edu
    Data store MongoDB - Document-oriented database slideshare.net
    Data store Spanner - Globally-distributed database from Google research.google.com
    Data store Memcached - Distributed memory caching system slideshare.net
    Data store Redis - Distributed memory caching system with persistence and value types slideshare.net
    File system Google File System (GFS) - Distributed file system research.google.com
    File system Hadoop File System (HDFS) - Open source implementation of GFS apache.org
    Misc Chubby - Lock service for loosely-coupled distributed systems from Google research.google.com
    Misc Dapper - Distributed systems tracing infrastructure research.google.com
    Misc Kafka - Pub/sub message queue from LinkedIn slideshare.net
    Misc Zookeeper - Centralized infrastructure and services enabling synchronization slideshare.net
    Add an architecture Contribute

    Company architectures

    Company Reference(s)
    Amazon Amazon architecture
    Cinchcast Producing 1,500 hours of audio every day
    DataSift Realtime datamining At 120,000 tweets per second
    Dropbox How we've scaled Dropbox
    ESPN Operating At 100,000 duh nuh nuhs per second
    Google Google architecture
    Instagram 14 million users, terabytes of photos
    What powers Instagram
    Justin.tv Justin.Tv's live video broadcasting architecture
    Facebook Scaling memcached at Facebook
    TAO: Facebook’s distributed data store for the social graph
    Facebook’s photo storage
    How Facebook Live Streams To 800,000 Simultaneous Viewers
    Flickr Flickr architecture
    Mailbox From 0 to one million users in 6 weeks
    Netflix A 360 Degree View Of The Entire Netflix Stack
    Netflix: What Happens When You Press Play?
    Pinterest From 0 To 10s of billions of page views a month
    18 million visitors, 10x growth, 12 employees
    Playfish 50 million monthly users and growing
    PlentyOfFish PlentyOfFish architecture
    Salesforce How they handle 1.3 billion transactions a day
    Stack Overflow Stack Overflow architecture
    TripAdvisor 40M visitors, 200M dynamic page views, 30TB data
    Tumblr 15 billion page views a month
    Twitter Making Twitter 10000 percent faster
    Storing 250 million tweets a day using MySQL
    150M active users, 300K QPS, a 22 MB/S firehose
    Timelines at scale
    Big and small data at Twitter
    Operations at Twitter: scaling beyond 100 million users
    How Twitter Handles 3,000 Images Per Second
    Uber How Uber scales their real-time market platform
    Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories
    WhatsApp The WhatsApp architecture Facebook bought for $19 billion
    YouTube YouTube scalability
    YouTube architecture

    Company engineering blogs

    Architectures for companies you are interviewing with.

    Questions you encounter might be from the same domain.

    Source(s) and further reading

    Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:

    Under development

    Interested in adding a section or helping complete one in-progress? Contribute!

    • Distributed computing with MapReduce
    • Consistent hashing
    • Scatter gather
    • Contribute

    Credits

    Credits and sources are provided throughout this repo.

    Special thanks to:

    Contact info

    Feel free to contact me to discuss any issues, questions, or comments.

    My contact info can be found on my GitHub page.

    License

    I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).

    Copyright 2017 Donne Martin
    
    Creative Commons Attribution 4.0 International License (CC BY 4.0)
    
    http://creativecommons.org/licenses/by/4.0/
    
    THUDM/CogVLM
    1 week, 6 days ago

    a state-of-the-art-level open visual language model | 多模态预训练模型


    CogVLM

    📖 Paper(论文)

    🌐 web demo(测试网址)

    🔥 News: CogVLM bilingual version is available online! Welcome to try it out!

    🔥 News: CogVLM中英双语版正式上线了!欢迎体验!

    🔥 News: We are currently preparing to open-source a more powerful model with rich chart and document understanding capabilities. It has achieved a score of 81 on DocVQA, so stay tuned for its release!

    中文版README

    Introduction

    • CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.

    • CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. CogVLM can also chat with you about images.

    Examples

    • CogVLM can accurately describe images in details with very few hallucinations.

      Click for comparison with LLAVA-1.5 and MiniGPT-4.


    • CogVLM can understand and answer various types of questions, and has a visual grounding version.

    • CogVLM sometimes captures more detailed content than GPT-4V(ision).

    Click to expand more examples.

    Method

    CogVLM model comprises four fundamental components: a vision transformer (ViT) encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. See Paper for more details.

    Get Started

    We support two GUIs for model inference, web demo and CLI. If you want to use it in your python code, it is easy to modify the CLI scripts for your case.

    First, we need to install the dependencies.

    pip install -r requirements.txt
    python -m spacy download en_core_web_sm
    

    Hardware requirement

    • Model Inference: 1 * A100(80G) or 2 * RTX 3090(24G).
    • Finetuning: 4 * A100(80G) [Recommend] or 8* RTX 3090(24G).

    Web Demo

    We also offer a local web demo based on Gradio. First, install Gradio by running: pip install gradio. Then download and enter this repository and run web_demo.py. See the next section for detailed usage:

    python web_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
    python web_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
    

    The GUI of the web demo looks like:

    CLI

    We open-source different checkpoints for different downstreaming tasks:

    • cogvlm-chat The model after SFT for alignment, which supports chat like GPT-4V.
    • cogvlm-base-224 The original checkpoint after text-image pretraining.
    • cogvlm-base-490 The finetuned version on 490px resolution from cogvlm-base-224. The finetuning data includes the training sets of VQA datasets.
    • cogvlm-grounding-generalist. This checkpoint supports different visual grounding tasks, e.g. REC, Grounding Captioning, etc.

    Run CLI demo via:

    python cli_demo.py --from_pretrained cogvlm-base-224 --version base --english --bf16 --no_prompt
    python cli_demo.py --from_pretrained cogvlm-base-490 --version base --english --bf16 --no_prompt
    python cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
    python cli_demo.py --from_pretrained cogvlm-grounding-generalist --version base --english --bf16
    

    The program will automatically download the sat model and interact in the command line. You can generate replies by entering instructions and pressing enter. Enter clear to clear the conversation history and stop to stop the program.

    Multi-GPU inference

    We also support model parallel inference, which splits model to multiple (2/4/8) GPUs. --nproc-per-node=[n] in the following command controls the number of used GPUs.

    torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo.py --from_pretrained cogvlm-chat --version chat --english --bf16
    

    Note:

    • If you have trouble in accessing huggingface.co, you can add --local_tokenizer /path/to/vicuna-7b-v1.5 to load the tokenizer.
    • If you have trouble in automatically downloading model with 🔨SAT, try downloading from 🤖modelscope or 🤗huggingface manually.
    • Download model using 🔨SAT, the model will be saved to the default location ~/.sat_models. Change the default location by setting the environment variable SAT_HOME. For example, if you want to save the model to /path/to/my/models, you can run export SAT_HOME=/path/to/my/models before running the python command.

    The program provides the following hyperparameters to control the generation process:

    usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english]
    
    optional arguments:
      -h, --help            show this help message and exit
      --max_length MAX_LENGTH
                            max length of the total sequence
      --top_p TOP_P         top p for nucleus sampling
      --top_k TOP_K         top k for top k sampling
      --temperature TEMPERATURE
                            temperature for sampling
      --english             only output English
    

    Finetuning

    You may want to use CogVLM in your own task, which needs a different output style or domain knowledge. We here provide a finetuning example for Captcha Recognition.

    1. Start by downloading the Captcha Images dataset. Once downloaded, extract the contents of the ZIP file.

    2. To create a train/validation/test split in the ratio of 80/5/15, execute the following:

      python scripts/split_dataset.py
      
    3. Start the fine-tuning process with this command:

      bash scripts/finetune_(224/490)_lora.sh
      
    4. Merge the model to model_parallel_size=1: (replace the 4 below with your training MP_SIZE)

      torchrun --standalone --nnodes=1 --nproc-per-node=4 merge_model.py --version base --bf16 --from_pretrained ./checkpoints/merged_lora_(224/490)
      
    5. Evaluate the performance of your model.

      bash scripts/evaluate_(224/490).sh
      

    It is recommended to use the 490px version. However, if you have limited GPU resources (such as only one node with 8* RTX 3090), you can try 224px version with model parallel.

    The anticipated result of this script is around 95% accuracy on test set.

    It is worth noting that the fine-tuning examples only tune limited parameters. (Expert only) If you want to get >98% accuracy, you need to increase the trainable parameters in finetune_demo.py.

    License

    The code in this repository is open source under the Apache-2.0 license, while the use of the CogVLM model weights must comply with the Model License.

    Citation & Acknowledgements

    If you find our work helpful, please consider citing the following papers

    @article{wang2023cogvlm,
          title={CogVLM: Visual Expert for Pretrained Language Models}, 
          author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
          year={2023},
          eprint={2311.03079},
          archivePrefix={arXiv},
          primaryClass={cs.CV}
    }
    

    In the instruction fine-tuning phase of the CogVLM, there are some English image-text data from the MiniGPT-4, LLAVA, LRV-Instruction, LLaVAR and Shikra projects, as well as many classic cross-modal work datasets. We sincerely thank them for their contributions.

    cpacker/MemGPT
    1 week, 6 days ago

    Teaching LLMs memory management for unbounded context 📚🦙


    MemGPT

    Try out our MemGPT chatbot on Discord!

    ⭐ NEW: You can now run MemGPT with local LLMs and AutoGen! ⭐

    🤖 Create perpetual chatbots with self-editing memory!


    🗃️ Chat with your data - talk to your local files or SQL database!

    Quick setup

    Join Discord and message the MemGPT bot (in the #memgpt channel). Then run the following commands (messaged to "MemGPT Bot"):

    • /profile (to create your profile)
    • /key (to enter your OpenAI key)
    • /create (to create a MemGPT chatbot)

    Make sure your privacy settings on this server are open so that MemGPT Bot can DM you:
    MemGPT → Privacy Settings → Direct Messages set to ON

    You can see the full list of available commands when you enter / into the message box.

    What is MemGPT?

    Memory-GPT (or MemGPT in short) is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. Learn more about MemGPT in our paper.

    Running MemGPT locally

    Install MemGPT:

    pip install pymemgpt
    

    Add your OpenAI API key to your environment:

    
    export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
    set OPENAI_API_KEY=YOUR_API_KEY # on Windows
    $Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)
    

    Configure default setting for MemGPT by running:

    memgpt configure
    

    Now, you can run MemGPT with:

    memgpt run
    

    You can run the following commands in the MemGPT CLI prompt:

    • /exit: Exit the CLI
    • /attach: Attach a loaded data source to the agent
    • /save: Save a checkpoint of the current agent/conversation state
    • /dump: View the current message log (see the contents of main context)
    • /dump <count>: View the last messages (all if is omitted)
    • /memory: Print the current contents of agent memory
    • /pop: Undo the last message in the conversation
    • /pop <count>: Undo the last messages in the conversation. It defaults to 3, which usually is one turn around in the conversation
    • /retry: Pops the last answer and tries to get another one
    • /rethink <text>: Will replace the inner dialog of the last assistant message with the to help shaping the conversation
    • /rewrite: Will replace the last assistant answer with the given text to correct or force the answer
    • /heartbeat: Send a heartbeat system message to the agent
    • /memorywarning: Send a memory warning system message to the agent

    Once you exit the CLI with /exit, you can resume chatting with the same agent by specifying the agent name in memgpt run --agent <NAME>.

    Documentation

    See full documentation at: https://memgpt.readthedocs.io/

    Installing from source

    To install MemGPT from source, start by cloning the repo:

    git clone git@github.com:cpacker/MemGPT.git
    

    Then navigate to the main MemGPT directory, and do:

    pip install -e .
    

    Now, you should be able to run memgpt from the command-line using the downloaded source code.

    If you are having dependency issues using pip install -e ., we recommend you install the package using Poetry (see below). Installing MemGPT from source using Poetry will ensure that you are using exact package versions that have been tested for the production build.

    Installing from source (using Poetry)

    First, install Poetry using the official instructions here.

    Then, you can install MemGPT from source with:

    git clone git@github.com:cpacker/MemGPT.git
    poetry shell
    poetry install
    

    Support

    For issues and feature requests, please open a GitHub issue or message us on our #support channel on Discord

    Datasets

    Datasets used in our paper can be downloaded at Hugging Face.

    🚀 Project Roadmap

    • Release MemGPT Discord bot demo (perpetual chatbot)
    • Add additional workflows (load SQL/text into MemGPT external context)
    • Integration tests
    • Integrate with AutoGen (discussion)
    • Add official gpt-3.5-turbo support (discussion)
    • CLI UI improvements (issue)
    • Add support for other LLM backends (issue, discussion)
    • Release MemGPT family of open models (eg finetuned Mistral) (discussion)
    spdustin/ChatGPT-AutoExpert
    2 weeks ago

    🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).


    ChatGPT AutoExpert

    by Dustin Miller • RedditSubstack

    License: Attribution-NonCommercial-ShareAlike 4.0 International

    Elevating Conversational AI to Expert Level


    ❇️ NEW ❇️

    I've created a set of "Custom GPTs" with updated versions of these prompts:


    Want to support these free prompts? My Substack offers paid subscriptions, that's the best way to show your appreciation.

    Introduction

    ChatGPT AutoExpert is a shockingly effective set of custom instructions aimed at enhancing the capabilities of GPT-4 and GPT-3.5-Turbo conversational models. These instructions maximize the depth and nuance in responses while minimizing general disclaimers and hand-holding. The ultimate objective is to provide users with accurate, context-rich information and an improved learning experience.

    Getting Started

    To get started with ChatGPT AutoExpert, choose which set of custom instructions you want to use:

    Features

    "Standard Edition"

    • ✳️ New to v5: Automatically Improves your Question
      Many of us still compose ambiguous questions when asking ChatGPT for help. The AutoExpert will automatically rewrite your question to be precise, and to elicit the best response the experts can provide.
    • ✳️ New to v5: Slash Commands
      Slash commands offer an easy way to interact with the AutoExpert system. Get summaries, ideas for additional questions, alternative viewpoints…even ask ChatGPT to review its own answer and suggest improvements.
    • ✳️ New to v5: Auto-selects Frameworks and Methodologies
      Designed to select a context-appropriate framework for formulating its best answers
    • Maximized Depth and Nuance
      Receive high-quality, in-depth, and ✳️ New to v5: multi-turn responses (GPT-4 only) without compromising on the granularity of the information.
    • Perfect for Everyday Use
      No need to switch these instructions on and off. They'll give you a greatly improved experience with ChatGPT, even if you're writing code. Although, if you are writing code, you should check the Developer Edition
    • Automatically Identifies the Best Expert
      The AutoExpert custom instruction automatically finds the best expert roles to answer whatever question you ask, every time. You don't need a bunch of special prompts any more—this works with even the simplest of prompts!
    • Minimized Hand-Holding
      Cut through the noise and get straight to the facts, reducing unnecessary disclaimers.
    • Explicit Reasoning
      Your AutoExpert doesn't just provide answers; it offers an explanation, detailing the thought process behind each response.
    • Resourceful Links
      Automatically generates inline links for related topics and "you may also like" topics, helpfully linked to Google search results to avoid hallucination (GPT-3.5 still hallucinates here, but not always. GPT-4 is rock-solid).

    "Developer Edition"

    [!IMPORTANT] This requires a ChatGPT professional subscription, as it needs both GPT-4 and Advanced Data Analysis!

    • Verbosity Selection
      Easily choose the complexity of the generated code, from compact "code golf" type responses, up to complex, modular code samples
    • Powered by Jupyter
      ChatGPT Advanced Data Analysis already runs a Jupyter kernel under the hood. AutoExpert (Developer Edition) comes with a companion Python script that you simply upload to your conversation. It will automatically take advantage of the sandbox Python environment for editing longer code samples, and activate a handful of extra "slash commands" to make your life even easier.
    • Pick Up Where You Left Off
      You can start a new chat without worrying about ChatGPT forgetting what you were doing in the previous one. The /memory slash command will download all your files, and a history of everything that's been done during your session. Simply upload it (along with the companion script) in a new session, and pick up where you left off.
    • Install Custom Wheels
      Yeah, you heard me. Wheels for Python packages can be uploaded and installed automatically.
      • Note that your ChatGPT sandbox uses Python 3.8, on a VM with x86_64 architecture (as of this writing).
    • Save Your Work
      Among other /slash commands, AutoExpert (Developer Edition) will save all your code snippets, dehydrate its memory of your requirements and the work it's done—even back up the code cells themselves. Then it zips it up, and you can quickly download your coding conversation history.
    • File and Symbol Tree
      By keeping a running history along with a file/symbol tree at the end of each response, ChatGPT will always remember what it just did, and you'll always see what files still need work. It's even smart enough to handle breaking down complex requirements in a way that allows it to write code over multiple turns.

    ChatGPT AutoExpert (both standard and "Developer Edition")
    by Dustin Miller is licensed under Attribution-NonCommercial-ShareAlike 4.0 International

    tiangolo/sqlmodel
    2 weeks, 5 days ago

    SQL databases in Python, designed for simplicity, compatibility, and robustness.


    SQLModel, SQL databases in Python, designed for simplicity, compatibility, and robustness.


    Documentation: https://sqlmodel.tiangolo.com

    Source Code: https://github.com/tiangolo/sqlmodel


    SQLModel is a library for interacting with SQL databases from Python code, with Python objects. It is designed to be intuitive, easy to use, highly compatible, and robust.

    SQLModel is based on Python type annotations, and powered by Pydantic and SQLAlchemy.

    The key features are:

    • Intuitive to write: Great editor support. Completion everywhere. Less time debugging. Designed to be easy to use and learn. Less time reading docs.
    • Easy to use: It has sensible defaults and does a lot of work underneath to simplify the code you write.
    • Compatible: It is designed to be compatible with FastAPI, Pydantic, and SQLAlchemy.
    • Extensible: You have all the power of SQLAlchemy and Pydantic underneath.
    • Short: Minimize code duplication. A single type annotation does a lot of work. No need to duplicate models in SQLAlchemy and Pydantic.

    SQL Databases in FastAPI

    SQLModel is designed to simplify interacting with SQL databases in FastAPI applications, it was created by the same author. 😁

    It combines SQLAlchemy and Pydantic and tries to simplify the code you write as much as possible, allowing you to reduce the code duplication to a minimum, but while getting the best developer experience possible.

    SQLModel is, in fact, a thin layer on top of Pydantic and SQLAlchemy, carefully designed to be compatible with both.

    Requirements

    A recent and currently supported version of Python.

    As SQLModel is based on Pydantic and SQLAlchemy, it requires them. They will be automatically installed when you install SQLModel.

    Installation

    $ pip install sqlmodel
    ---> 100%
    Successfully installed sqlmodel
    

    Example

    For an introduction to databases, SQL, and everything else, see the SQLModel documentation.

    Here's a quick example. ✨

    A SQL Table

    Imagine you have a SQL table called hero with:

    • id
    • name
    • secret_name
    • age

    And you want it to have this data:

    id name secret_name age
    1 Deadpond Dive Wilson null
    2 Spider-Boy Pedro Parqueador null
    3 Rusty-Man Tommy Sharp 48

    Create a SQLModel Model

    Then you could create a SQLModel model like this:

    from typing import Optional
    
    from sqlmodel import Field, SQLModel
    
    
    class Hero(SQLModel, table=True):
        id: Optional[int] = Field(default=None, primary_key=True)
        name: str
        secret_name: str
        age: Optional[int] = None
    

    That class Hero is a SQLModel model, the equivalent of a SQL table in Python code.

    And each of those class attributes is equivalent to each table column.

    Create Rows

    Then you could create each row of the table as an instance of the model:

    hero_1 = Hero(name="Deadpond", secret_name="Dive Wilson")
    hero_2 = Hero(name="Spider-Boy", secret_name="Pedro Parqueador")
    hero_3 = Hero(name="Rusty-Man", secret_name="Tommy Sharp", age=48)
    

    This way, you can use conventional Python code with classes and instances that represent tables and rows, and that way communicate with the SQL database.

    Editor Support

    Everything is designed for you to get the best developer experience possible, with the best editor support.

    Including autocompletion:

    And inline errors:

    Write to the Database

    You can learn a lot more about SQLModel by quickly following the tutorial, but if you need a taste right now of how to put all that together and save to the database, you can do this:

    from typing import Optional
    
    from sqlmodel import Field, Session, SQLModel, create_engine
    
    
    class Hero(SQLModel, table=True):
        id: Optional[int] = Field(default=None, primary_key=True)
        name: str
        secret_name: str
        age: Optional[int] = None
    
    
    hero_1 = Hero(name="Deadpond", secret_name="Dive Wilson")
    hero_2 = Hero(name="Spider-Boy", secret_name="Pedro Parqueador")
    hero_3 = Hero(name="Rusty-Man", secret_name="Tommy Sharp", age=48)
    
    
    engine = create_engine("sqlite:///database.db")
    
    
    SQLModel.metadata.create_all(engine)
    
    with Session(engine) as session:
        session.add(hero_1)
        session.add(hero_2)
        session.add(hero_3)
        session.commit()
    

    That will save a SQLite database with the 3 heroes.

    Select from the Database

    Then you could write queries to select from that same database, for example with:

    from typing import Optional
    
    from sqlmodel import Field, Session, SQLModel, create_engine, select
    
    
    class Hero(SQLModel, table=True):
        id: Optional[int] = Field(default=None, primary_key=True)
        name: str
        secret_name: str
        age: Optional[int] = None
    
    
    engine = create_engine("sqlite:///database.db")
    
    with Session(engine) as session:
        statement = select(Hero).where(Hero.name == "Spider-Boy")
        hero = session.exec(statement).first()
        print(hero)
    

    Editor Support Everywhere

    SQLModel was carefully designed to give you the best developer experience and editor support, even after selecting data from the database:

    SQLAlchemy and Pydantic

    That class Hero is a SQLModel model.

    But at the same time, ✨ it is a SQLAlchemy model ✨. So, you can combine it and use it with other SQLAlchemy models, or you could easily migrate applications with SQLAlchemy to SQLModel.

    And at the same time, ✨ it is also a Pydantic model ✨. You can use inheritance with it to define all your data models while avoiding code duplication. That makes it very easy to use with FastAPI.

    License

    This project is licensed under the terms of the MIT license.

    odoo/odoo
    2 weeks, 5 days ago

    Odoo. Open Source Apps To Grow Your Business.


    Odoo

    Odoo is a suite of web based open source business apps.

    The main Odoo Apps include an Open Source CRM, Website Builder, eCommerce, Warehouse Management, Project Management, Billing & Accounting, Point of Sale, Human Resources, Marketing, Manufacturing, ...

    Odoo Apps can be used as stand-alone applications, but they also integrate seamlessly so you get a full-featured Open Source ERP when you install several Apps.

    Getting started with Odoo

    For a standard installation please follow the Setup instructions from the documentation.

    To learn the software, we recommend the Odoo eLearning, or Scale-up, the business game. Developers can start with the developer tutorials

    yunjey/pytorch-tutorial
    2 weeks, 5 days ago

    PyTorch Tutorial for Deep Learning Researchers



    This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less than 30 lines of code. Before starting this tutorial, it is recommended to finish Official Pytorch Tutorial.


    Table of Contents

    1. Basics

    2. Intermediate

    3. Advanced

    4. Utilities


    Getting Started

    $ git clone https://github.com/yunjey/pytorch-tutorial.git
    $ cd pytorch-tutorial/tutorials/PATH_TO_PROJECT
    $ python main.py
    

    Dependencies