Why caching? The main purpose of caching is to prevent expensive calls. Caching data means, we don’t fetch any data that we already know of. A cache is like steroids for data retrieval, it can boost the performance of an application like a rocket. But it comes at its cost: the downside of caching is that you have to deal with data possibly not being up to date and that your app consumes way more memory than without a cache.
Cache Hit – a cache hit occurs, when data is requested from the cache and was actually received from the cache.
Cache Miss – a cache miss occurs, when data is requested from the cache, but was not found in the cache. In this case, the system needs to fetch the data from the next caching level or some other data store. This causes an extra round trip and will mostly cause a downgrade in performance.
Cache Miss Ratio – ratio of cache misses to the total number of requests.
Cache Hit Ratio – ratio of cache hits to the total number of requests.
cache-hit-ratio = 1 - cache-miss-ratio
Flush – write the data stored in the cache into the next caching level or data store.
LRU – Least Recently Used
LFU – Least Frequently Used
Strategies for Reading data
Let’s have a look at some basic pattern commonly used to read data from a data store using a caching mechanism.
The cache aside pattern is the easiest and most widely used caching strategy. One advantage of using cache aside is that in the caching implementation it is directly visible to the developer. IT is obvious when data is read from cache and when from the data store. Another benefit is that the data model for requesting the data store and for requesting the cache can be different. For example the cumulative result of several database queries can be a single value which gets stored in the cache. The downside of the cache aside strategy is that the developer has to take care of cache invalidation. What happens, if the data store gets changed by some other process? The cached data is not necessarily guaranteed to be updated. So, the developer has to ensure, that either all updates to the data store also lead to either an update of the corresponding data in the cache or to the removal of the outdated data from the cache.
Cache aside can be used for systems with a lot of read operations.
Another strategy for caching data is the “read-through” pattern. The application requests data from the cache. When the cache does not contain the requested data, the cache (not the application!) requests the data from the data store.
In this scenario, the cache becomes a data provider, acting as a surrogate for the original data store. This means, that requesting data from the data store and from the cache must use the same data model.
The read through strategy has the advantage, that the application does not necessarily know about the caching mechanism. So, this follows the separation of concerns principle, as the application only deals with the application logic, while the cache is dealing with the caching logic.
The downside here is that every read operation is processed via the cache, so every read item of the data store will be stored in the cache. This means that the cache size can grow up to the same size of the underlying data store, unless there is a limitation mechanism restricting from growing indefinitely.
Let’s take a look at some common strategies for writing data to a adata store using a caching mechanism.
In the write-through pattern all write requests go to the cache. The application is not directly accessing the data store. All write requests happen through the cache. The cache is responsible for writing the data to the underlying data store and keeping the data in the cache up to date.
It is important to note, that the write operation must not return before the data has been pushed to the data store. So, the write operation is always a blocking call and not an asynchronous operation. This behavior mainly ensures data consistency. The downside of this pattern is that all write operations are done twice: one operation on the cache and one on the data store.
To circumvent the blocking call to the cache when writing data, the write-back strategy can be used. The application again writes only to the cache and does not access the data store directly. The difference is that the write operation on the cache directly returns even before the data is persisted in the data store. Hence the write operation happens in an asynchronous way.
In this pattern, the cache keeps the written data in its own buffer until a certain flush-condition is met. When this condition becomes true, all cached data is written back to the data store.
The main advantage of this strategy is that writing performance can be highly improved due the the asynchronous nature of the pattern. The downside is that there is no guarantee for data consistency anymore. It the cache goes down before the data is written to the data store, the data is lost. This can be even worst, when the application assumes that the data has been persisted, but actually hasn’t.
Another issue that must be thought of, is that any access to the original data store from outside of the application, may operate with outdated data. As the data is still in the cache but hasn’t made it to the data store yet, no other application can operate in a real time manner on the data store.
Some times this pattern is also referred to as “write behind”.
Caching in Hardware
Besides the caches that we use when developing software on a higher abstraction level, there are a lot of scenarios, when low level caching is also very important.
For example modern CPUs have a set of different hardware caches. These caches are mostly called L1-, L2-, and L3-cache. The L1-cache (first-level cache) is the smallest and fastes cache. It is located close to the CPU. The CPU retrieves data directly from the L1-cache. If the data is not found in the L1-cache, the requests are cascading to the L2-cache (second level cache) and then to the L3-cache. These caches are often dealing with the blocks of physical memory and are therefore organized in so called “cachelines”. A cacheline is the smallest unit, that a cache can operate on. A cacheline is read/written to/from memory as a single block in one operation. A cacheline typically contains of 64 bits. So, if the CPU accesses bit 0 at some address in memory, the next 63 bits are automatically stored in the cache.
RAID controllers are a Middleware between the CPU and one or more hard disks. Read/Write requests are sent to the RAID controller, which does the actual reading/writing to the disk. Here the RAID controller acts like a hard disk to the system, but can actually behave like a cache. Most RAID controllers offer a setting to enable/disable a “write back” mode. This setting is switching the write strategy of the controller between a write-back and a write-through implementation (or something else depending on the controller).
So, in an ideal world our cache has unlimited resources and can grow indefinitely. Unfortunately, we don’t have unlimited resources. So, when a cache reaches is limits, it has to decide, if data can be removed from the cache to free up some memory. And if so, which of the cached data should be kept and which removed?
There are two commonly used policies when it comes to cache invalidation:
LRU – the “least recently used” policy assumes that newer data in the cache is more relevant to the application than older data in the cache. So, if data hasn’t been accessed for a longer time, it will probably not be accessed again in the near future.
LFU – the “least frequently used” policy assumes that data is more important to the application, when the application is requesting the data more often. So, data that has been requested more than other data in the past might be accessed again more often in the near future.
There are other approaches for invalidation strategies, which may be more related to the internals of the application logic. For example if the creation time of an object is very long, it may be good to keep this object and to remove objects that can be recreated faster. Or if data has to be retrieved over a slow network, the size of the object may be a good choice for the invalidation strategy.