Yelp maintains a list of millions of restaurants that users can look up on their app. The user can search for restaurants by name, location, cuisine, etc. The user can also look up the reviews of the restaurant and the menu. In this article, we will discuss the system design of the listing of nearby locations on Yelp.
The main problem with location-based search is the sheer number of records we need to search through using latitude and longitude. To solve this problem, normally we can divide the world into a grid structure and store the data in each grid. By using the grid location of the user, we can narrow down the search to a smaller set of data. Also, since the data is not uniformly distributed, a quadtree structure is usually used to organize the world data to further optimize the search functionality.
In Yelp's case the database is only used as a master list to build the quadtree structure which is saved in memory on servers. We still need the ability to scale the database to accommodate the increase in data. It is always a good idea to use a modern document database to make horizontal scaling easier.
Since the majority of the requests are reads, we can use a leader-follower replication strategy. Since the writes can be slower, we can write to the leader and replicate the data to the followers synchronously, avoiding complex data resolution issues.
To allow for ultra-fast reads we can use a globally distributed Cache between the server and the database. The server can try to pull from the cache first and if the data is not found in the cache, it can pull from the server hosting the quad data structure. We can use the Least recently used (LRU) algorithm to evict the data from the cache.