Updated: Jun 11
What is caching?
A cache is a hardware or software component that acts as a temporary storage allowing fast access to data stored in it. The primary objective behind using a cache in any application is to improve performance. The process of storing and accessing the data from cache is known as caching.
Caching is used everywhere. It is used in different layers like operating systems, CDN, DNS and also by various applications and services.
If the requested data or item is available in the cache it is called a cache hit and if the requested data is not available in the cache it is called a cache miss. If implemented correctly, caches can help improve response times, reduce load on database, and save computation costs.
Retrieving data from a persistent storage like database can take a considerable amount of time, caches reduce the response time of our API by providing fast access to data. Mainly caches are used to avoid frequency of network calls to database and to store the results of operations that are computationally expensive. Caches can help bring down the computation costs especially if your application is running on a cloud.
How caching is useful?
There are a wide variety of use cases where caching can be applied. Some of the scenarios where caching is useful are:
Frequently Requested Data
One of the popular scenarios where caching is useful is if you have to frequently query for some commonly used data. For example, in a service like twitter each time when we open a user profile, a common query is to get the count of followers/following for that user. This is not a very frequently changing data and is a good candidate for caching. You can fetch this data from database the first time when any user tries to access it, after which it can be cached and each subsequent request thereafter can be served from cache until the data becomes stale, which helps us avoid network calls to database for each request.
Also, if you remember we have made use of caching in our URL shortener system design to cache the most frequently used short URLs, this is another example which shows the real life use case of a cache.
Some APIs are simple and have fast response times, while others might require you to do multiple intermediary steps involving slow and heavy operations that might delay the response time.
A good example for this is a user feed API in a service like instagram or facebook. Displaying user feed for a particular user is mostly based on custom algorithms that may involve several computationally expensive operations like fetching all the people, public pages that a particular user follows from database, separating out the most recent posts made by his followers and pages, aggregating all of these posts and building a time sorted list using all of this data.
Since we may have to make multiple calls to database and a lot of computation in order to get, aggregate and sort this data, our API response time can take a hit if we try to compute this on the go (as and when we get the request from user). And since a user feed is the first page that loads when we open an application like facebook or instagram, this can lead to a bad user experience.
So in order to improve the performance and reduce the response time, we can precompute user feed for a particular user beforehand and store it in the cache (even before a request for user feed is made). We can serve this data to users directly from cache when he requests for it. This can potentially reduce the response time of our API from several seconds to few milliseconds.
Avoid Load On Database
If our service has a large number of users, and there are multiple microservices or replicas handling user requests which in turn call the database, this can put a lot of load on our database (especially during peak hours). This situation can be avoided by having a cache (most likely a distributed cache) which can ease the load on our database.
Want to master coding? Looking to learn new skills and crack interviews? We recommend you to explore these tailor made courses:
Why not cache everything?
You might be wondering, if caches are so fast and efficient why not store all our data in cache instead of putting it in the database? Wouldn't that be an ideal thing to do?
There are a few reasons mainly why we cannot do this:
Firstly, the hardware used by caches are expensive as compared to the hardware used by traditional databases. Traditional databases mostly run on commodity hardware which are relatively inexpensive. So we may have to shell out a ton of money if we were to store everything in cache.
Second, storing all our data on cache instead of a database is counter-intuitive because this defeats the purpose of using a cache in the first place. This is so because as you store more and more data on cache, this might increase the time needed to fetch the data from cache making it redundant.
Where do a caches fit in?
A typical web application backed by a data store would look like this:
This data store can be a traditional database or another microservice. When client makes a request to our service, it initially hits one of the microservices responsible for handling the request. This microservice in turn connects to database to retrieve the data requested by the client.
Calls to database can be slow and can utilize a lot of system resources. It would be a good idea to store at least some of these items in memory so that we don't have to reach the database for each and every request.
What this does is firstly it can improve the response time of our API since we are responding directly from cache. Second even if our database is down due to some failure our service might still be able to serve some of the user requests. Also if there are lots of clients requesting for the same data again and again, having a cache in between can reduce load on our database.
When we get a client request, our service first check to see if that data is present in cache, if yes our service directly responds from cache.
If the data is not present or if the data is outdated it fetches the data from database.
Ideally the data that is stored in cache is transient and not meant to be there forever. This data has to be cleaned up or updated from time to time to keep it coherent with the datasource. Data in cache can go stale if the original data (in database for instance) is changed or removed.
The process of cleaning up or updating the data in cache with new values to keep it in sync with the original data is known as cache invalidation.
One of the popular ways to invalidate a cache entry is by using the TTL strategy. In Time to live or TTL strategy each entry in cache is associated with a specific time after which the data is considered stale.
Once the data expires, we can flush these stale entries using one of two ways:
When the user requests for a stale data, we can actively expire it.
We can also do this passively with the help of a job that runs periodically at specified intervals.
Cache Eviction Policies
Cache eviction policies control the way in which items are removed from cache, when the cache is full. Based on the algorithm selected a cache eviction policy decides which item to remove from cache when the cache limit is reached.
Why is it needed?
It is important that we store data in cache in such a way that no matter how much data is stored in our database, our cache only has relevant items considering the future requests that are expected to come into our system. While predicting the future requests we need to consider two things mainly: 1. When to add items to cache 2. When do we remove items from cache.
Our cache performance almost entirely depends on our cache policy. Imagine having a very poor eviction policy, every time the service requests for some data from cache, it results in a cache miss. So hitting the cache is of no use in this case. Call to a cache is an extra step, and every time if it responds with no data, you would end up pulling data from the database almost all the time with an additional redundant call to cache as well, which can add to the delays instead of improving performance. In such a case having a cache becomes an extra overhead instead of improving application performance.
Also as mentioned earlier hardware used by cache are expensive, so storing a ton of items in cache would not make any sense both in terms of budget as well as performance. So we need to set a limit on the maximum number of items that can be stored in a cache at any given time. When the cache is full we eliminate or remove certain items depending on the cache eviction policy selected.
Some of the well known cache eviction policies are:
LRU - Least Recently Used: In this policy, the oldest item is removed from cache when the cache is full. We have described LRU cache in detail in a separate article.
LFU - Least Frequently Used: In this policy, items are evicted based on the frequency of usage. Each item in cache has a count of how many times it is requested, when the cache is full the item with the least count is evicted.
MRU - Most Recently Used: This is exactly opposite to the LRU policy. When the cache is full, the item that is most recently requested is evicted from cache.
RR - Random Replacement: When the cache is full, a random item is evicted from cache.
FIFO - First In First Out: Items are evicted in the order in which they were added to cache.
LIFO - Last In First Out: The cache evicts item that was added most recently regardless of how many times it was accessed before.
Note: It is important to note that cache invalidation is different from cache eviction. We invalidate data from cache because it is either stale or has expired, but we evict data from cache when the cache limit is reached (memory is full).
If the amount of data from a service or an application is too large to be stored on cache memory of a single machine, the data in this case has to be distributed across multiple machines.
Distributed cache is an extension of traditional cache. While a traditional cache is mostly a single server or machine, a distributed cache is can grow beyond the memory limits of a single machine, formed by the interlinking of multiple cache servers or cache clusters.
A distributed cache has its data spread across several nodes (servers) in a cluster. It can also be across several clusters across geographically distributed data centers.
Distributed caches have the ability to scale horizontally. As the data grows, we can add more machines (cache servers/nodes) to our cluster allowing our cache to grow along with the growing data requirements.
Distributed caches are especially useful for applications with large data volumes. Some of the popular distributed cache solutions are Memcahed, Redis, Hazelcast.
How cache is different from a CDN?
CDNs are geographically distributed network of servers that work together to provide content (videos, images etc) to users more quickly. CDN acts as an intermediate layer between the end user and the server minimizing the number of requests that need to be served from the origin server.
Consider a service like Netflix having origin server in United States. For a user viewing content say from India, if you have to serve this content from the origin server, this could result in a lot of delays and buffering for the user viewing the content because of distance the data has to travel from server to end user.
This is where CDN comes to rescue. CDNs have servers distributed all over the world. These servers cache data and when user requests for data, instead of serving this data from origin server it is served from the CDN server nearest to the user thereby reducing the delay.
The main difference between a cache and CDN is that while CDNs do perform caching, not everything that performs caching is a CDN. Also CDN servers are strategically placed at the network exchange points (IXPs) to avoid network round trips, while this may not always be true with a regular cache.
Note: Internet exchange points (IXPs) are points where different internet providers connect to exchange traffic originating on their network.