Sunday, 25 September 2016

Hash tagging Redis keys in a clustered environment

Hello folks,

In this post, we'll talk a little bit about

  • Redis cluster.
  • Limitations of Redis cluster.
  • How we can overcome the limitations of redis cluster.

Redis cluster is a distributed implementation of Redis with three major goals:

  • High performance and linear scalability upto 1000 nodes.
  • Acceptable degree of write safety.
  • Availability: Redis cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable.

As an example, let's take a look at the following cluster configuration with six nodes.

  1. Node A (master node)
  2. Node B (master node)
  3. Node C (master node)
  4. Node D (slave of master node A)
  5. Node E (slave of master node B)
  6. Node F (slave of master node C)
Now at this point, a natural question may arise, "When I write to a Redis key, how a node is picked up to store that key or what are the factors that decide which node to store the key in"?

To get answer to this question, let's have a look at Redis Key Distribution Model.
The key space is split into 16384 hash slots, effectively setting an upper limit for the cluster size 16384 master nodes (however, the suggested max size of nodes is in the order of ~1000 nodes).
Each master node in a cluster, handles a subset of 16384 slots.
So, possibly the key slot distribution for our above configuration is as below:
  • Master Node A contains hash slots from 0 to 5500.
  • Master Node B contains hash slots from 5501 to 11000.
  • Master Node C contains hash slots from 11001 to 16383.
Redis cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of a "Hash Slot". Redis uses CRC-16 to map a Key to a Hash Slot. The basic algorithm used to map keys to hash slots is the following:

HASH_SLOT=CRC16 (key) mod 16384

To summarize, a Hash Slot is decided from the key and from Hash Slot a Node is picked up to store the key.

At the same time, we also need to understand that, clustering implies some limitations on the way we use Redis keys (these limitations are very logical though).
  1. Transaction cannot be performed on keys which are part of different range of hash slots.
  2. Multi key operations cannot be performed on keys which are part of different range of hash slots.
For example, suppose there are two keys "key1" and "key2". key1 is mapped to hash slot 5500, thus, is stored in Node A. key2 is mapped to hash slot 5501, thus, is stored in Node B.
So, we cannot perform transaction on those keys. Nor we can perform multi key operations on key1 and key2. Multi key operation like "mget key1 key2" will throw exception.

However, in a practical scenario, we often find ourselves in a need to perform Multi Key operations.
How can we achieve this in a clustered environment?
A simple answer is, by ensuring that the keys on which we perform multi-key operation or transaction, are part of same hash slot range. And we ensure this by "Hash Tagging" Redis keys.
Hash Tags are a way to ensure that multiple keys are allocated in the same hash slot. There is an exception in the computation of hash slots which is used in implementing Hash Tags.

In order to implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern only the substring between { and } is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of { or }, the algorithm is well specified by the following rules:
  • IF the key contains a { character.
  • AND IF there is a } character to the right of {
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }.
Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed. Let's have a look at the following examples:
  1. The two keys {user1000}.following and {user1000}.followers will hash to the same hash slot since only the substring user1000 will be hashed in order to compute the hash slot.
  2. For the key foo{}{bar} the whole key will be hashed as usually since the first occurrence of { is followed by } on the right without characters in the middle.
  3. For the key foo{{bar}}zap the substring {bar will be hashed, because it is the substring between the first occurrence of { and the first occurrence of } on its right.
  4. For the key foo{bar}{zap} the substring bar will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of { and }.
  5. What follows from the algorithm is that if the key starts with {}, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
So far we've seen that, how hash tagging can help us overcome the limitations implied by Redis cluster.

Hope this article comes handy for many of you while working with Redis cluster.

Thank you....

Monday, 7 March 2016

Increase performance by using Cache-Control header in Blob Storage

Hello Folks,

Caching has always been an important part of a good application design and performance optimization.Fetching something over the network is both slow and expensive; large responses require many round-trips between the client and server, which delays when they are available and can be processed by the browser, and also incurs data costs for the visitor. As a result, the ability to cache and reuse previously fetched resources is a critical aspect of optimizing for performance.

In this post, we'll see how we can optimize performance by using "Cache-Control" header in Azure Blob Storage.For this, I assume you have an Azure subscription and have a Storage account.

Let's see step by step how we can add "Cache-Control" header to a block blob.

In this example we'll upload an image to Azure Blob Storage from a simple ASP.Net application.

1. Go to Azure Portal -> Storage->Containers and create a new container. You can select the access control option "Public Blob". In real life you would not like to select this option always, but let's have it for this post.

2. Open Visual Studio and create an ASP.Net application. On Default.aspx, let's have a Button and a FileUpload control. On click of the button, we'll upload the selected file to Blob Storage.

3. The code for Button click event handler looks like below:

protected void btnUpload_Click(object sender, EventArgs e)
   if (!file1.HasFile)
   var storageConStr = CloudConfigurationManager.GetSetting("StorageConStr");
   CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageConStr);
   CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
   CloudBlobContainer blobContainer = blobClient.GetContainerReference("container1");
   CloudBlockBlob blockBlob = blobContainer.GetBlockBlobReference(file1.FileName);
   //Add Cache-Control header
   blockBlob.Properties.CacheControl = "public,max-age="+60;

In the above snippet, see the assignment blockBlob.Properties.CacheControl = "public,max-age="+60. 

This is to specify that the response is cacheable by clients and shared (proxy) caches.

4. Now run the application, choose a file and store it in blob storage.

5. Now add an HTML image element on the page and assign the blob url to src attribute.

6. Now run the application, press F12 to open the developer console on browser and go to network tab.Find out the request to your blob url and have a look at the Response headers.

Points of interest:

Have a look at the response headers marked with red circles.

1. "Cache-Control" header is to specify that the browser can cache the response for the specified time.

2. "Etag" is one of several mechanisms that HTTP provides for web cache validation, which allows a client to make conditional requests.This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. You can learn more about Etags here.

7.Now refresh the browser page and have a look at the blob url request in the network tab of developer console. You will see something similar to the below image.

Points of interest:

1. Response Status Code 304 (Not Modified): The condition specified using HTTP conditional header(s) is not met.

The above statement describes pretty much everything about Http Status Code 304. It says that whatever condition was specified in the Request, was not met. Now let's have a look at point 2 below to understand exactly what "condition" was specified in the Request.

2. In-None-Match Request Header: In the above screenshot you can see the header is underlined in red. This is the condition specified in the request!! You can see, this header actually contains the same "Etag" which was sent by the server in step 6 above.

By including the "Etag" in the "If-None-Match" header the browser is actually requesting the server that, if the "Etag" is still same as the one included in the header, there is no need to download the resource from storage, instead return Status Code 304 (Not Modified). The browser will now get the file from it's own cache.

Note : The Etag on server will change only when the resource is modified.

In this post, we've seen how browser does efficient caching using different headers (like Cache-Control Response Header, If-None-Match Request Header) and Etag.

In my next post I'll show how we can achieve efficient caching using the same headers and Etag in .Net using C#.