Sunday, 25 September 2016

Hash tagging Redis keys in a clustered environment


Hello folks,

In this post, we'll talk a little bit about

  • Redis cluster.
  • Limitations of Redis cluster.
  • How we can overcome the limitations of redis cluster.

Redis cluster is a distributed implementation of Redis with three major goals:

  • High performance and linear scalability upto 1000 nodes.
  • Acceptable degree of write safety.
  • Availability: Redis cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable.

As an example, let's take a look at the following cluster configuration with six nodes.

  1. Node A (master node)
  2. Node B (master node)
  3. Node C (master node)
  4. Node D (slave of master node A)
  5. Node E (slave of master node B)
  6. Node F (slave of master node C)
Now at this point, a natural question may arise, "When I write to a Redis key, how a node is picked up to store that key or what are the factors that decide which node to store the key in"?

To get answer to this question, let's have a look at Redis Key Distribution Model.
The key space is split into 16384 hash slots, effectively setting an upper limit for the cluster size 16384 master nodes (however, the suggested max size of nodes is in the order of ~1000 nodes).
Each master node in a cluster, handles a subset of 16384 slots.
So, possibly the key slot distribution for our above configuration is as below:
  • Master Node A contains hash slots from 0 to 5500.
  • Master Node B contains hash slots from 5501 to 11000.
  • Master Node C contains hash slots from 11001 to 16383.
Redis cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of a "Hash Slot". Redis uses CRC-16 to map a Key to a Hash Slot. The basic algorithm used to map keys to hash slots is the following:

HASH_SLOT=CRC16 (key) mod 16384

To summarize, a Hash Slot is decided from the key and from Hash Slot a Node is picked up to store the key.

At the same time, we also need to understand that, clustering implies some limitations on the way we use Redis keys (these limitations are very logical though).
  1. Transaction cannot be performed on keys which are part of different range of hash slots.
  2. Multi key operations cannot be performed on keys which are part of different range of hash slots.
For example, suppose there are two keys "key1" and "key2". key1 is mapped to hash slot 5500, thus, is stored in Node A. key2 is mapped to hash slot 5501, thus, is stored in Node B.
So, we cannot perform transaction on those keys. Nor we can perform multi key operations on key1 and key2. Multi key operation like "mget key1 key2" will throw exception.

However, in a practical scenario, we often find ourselves in a need to perform Multi Key operations.
How can we achieve this in a clustered environment?
A simple answer is, by ensuring that the keys on which we perform multi-key operation or transaction, are part of same hash slot range. And we ensure this by "Hash Tagging" Redis keys.
Hash Tags are a way to ensure that multiple keys are allocated in the same hash slot. There is an exception in the computation of hash slots which is used in implementing Hash Tags.

In order to implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern only the substring between { and } is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of { or }, the algorithm is well specified by the following rules:
  • IF the key contains a { character.
  • AND IF there is a } character to the right of {
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }.
Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed. Let's have a look at the following examples:
  1. The two keys {user1000}.following and {user1000}.followers will hash to the same hash slot since only the substring user1000 will be hashed in order to compute the hash slot.
  2. For the key foo{}{bar} the whole key will be hashed as usually since the first occurrence of { is followed by } on the right without characters in the middle.
  3. For the key foo{{bar}}zap the substring {bar will be hashed, because it is the substring between the first occurrence of { and the first occurrence of } on its right.
  4. For the key foo{bar}{zap} the substring bar will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of { and }.
  5. What follows from the algorithm is that if the key starts with {}, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
So far we've seen that, how hash tagging can help us overcome the limitations implied by Redis cluster.

Hope this article comes handy for many of you while working with Redis cluster.

Thank you....