How to design Instagram Feed

 

Problem

Instagram Feed is the feed the user sees on their timeline when they log in. Normally this would be a trivial problem. We can get all the users that the user follows and then join that with the posts from each of the users and generate the feed. But Instagram has over 500 million daily active users potentially generating billions of posts a day. This means generating the feed on the fly will be slow.

Solution

Introduction

In order to avoid generating the feed on the fly, we can pre-generate the feed and store it in a database or cache. This means that when a user logs in, we can just query the database/cache and return the feed. We can pre-generate the feed by having a background server listen to an image upload event and then generate the feed for all the followers of the user who uploaded the image.

In the extreme case, if a celebrity with 100 million followers posts a picture, then we would have to generate the feed for 100 million users. This would be a very slow process and would not scale. In such cases, we can use rules to generate the feed. For example, we can generate the feed for the most active 10,000 followers of the celebrity and then batch the rest 10,000 at a time.

Database

  • Type: SQL or No Sql

    Since the data we need to store is just the metadata of the posts, we can get away with either choosing a relational database like MySQL or a NoSQL database like MongoDB. However, it is a well-known fact that it is much harder to horizontally scale a relational database than a NoSQL database.

  • Replication: Multi leader replication

    Since this is a global application, we need to replicate the database across multiple regions. Each region would need a write leader and multiple read-only followers. All the writes in a region would go to the leader to avoid slow writes. The writes would be asynchronously replicated to the followers.

  • Sharding: Geo based

    Since this is a global application, we need to shard the database across multiple regions. We can shard the database based on the location of the user. This would allow us to have a database in each region and avoid slow reads.

 

Message Bus

Message allows for asynchronous communication between different services. We can use a message bus to send a message to a background server when a user uploads an image. The background server can then generate the feed for all the followers of the user who uploaded the image.

 

CDN

A CDN is a network of servers that deliver content to users based on their geographic location. We can use a CDN to store all the images that are uploaded by the users. This would allow us to serve the images to the users faster.