django-cacheops
django-cacheops copied to clipboard
Feature: Multi-server support
This would be also a very nice thing that I've been seriously looking at. I think some options would be to integrate a smarter client wrapper instead of redis-py like: https://github.com/gmr/mredis https://github.com/salimane/rediscluster-py
Or just implement it directly, with the minimum required feature set for cacheops. I quite like how rediscluster works. There are some obstacles however.
- The straightforward solution is to distribute all cache keys evenly based on the modulo of CRC32 hash. This is handled by the libs above. However the way cacheops works, I suppose the schemas would need to be always stored/updated on every (master) server.
- Cacheops uses pipelining in some places which is more tricky to update for multi-server. But should be doable, especially with rediscluster.
Any thoughts on this matter would be appreciated.
There are two different ways to use multiple redises. First is when there are separate caches on different servers (this is handy when you have geographically distributed application servers), second when cache is distributed between redises. In both cases you need to invalidate on all servers. (You probably can ignore this and invalidate only local cache in first case when you know that queries on different app servers are different).
I assume you are talking about second case - sharding. Regarding schemes, storing all schemes on all instances is a bad idea, as far as I can see, cause then you'll need to update them on all instances on every cache miss. I was thinking about storing making each instance kind of self contained cache which holds its schemes and invalidators. This way you can write to cache to a single instance, you'll still need to invalidate on all of them, but I didn't found a general way to avoid this.
Also I should ask. Why do you need this? Are you out of CPU on single redis instance? out of memory on single server? need more durability? (In the latter case replication should work). I ask all these questions because the answers will determine what kind of multi-server support you need.
I myself actually looked into first variant of multi-server support, but found it's ok to rely on timeout invalidation for not so high coherent data.
My reasons would be, in order of importance:
- Durability
- Out of memory on single server
And 1. should also be the easiest to solve indeed. Set up 1 master with N slaves and use for example rediscluster to do all the communication with the cache. It already knows to do all writes to the master and distribute reads to the slaves by default. We still write to a single server so that remains simple. One potential problem that could appear here is replication lag... but not sure if that would really be noticeable enough to cause visible issues. Probably not.
And actual sharding would be the ideal solution (if implemented fairly efficiently of course) for the 2nd problem, similarly how you can easily add N memcache servers if using more standard Django caching solutions.
Sharding won't work with redis cluster cause we need to store our keys with their invalidators on single instance and cluster stores keys as it wants. And you most probably don't need cluster for 1. So it's out of question as I know.
For 1 the single thing is needed from cacheops is failover, redis already handles replication.
2 is not so easy. One need to map keys to instances and then use single instance for data, invalidators and schemes in cache_thing(). This means there would be separate collection of schemes for each redis node. And when you invalidate you will use all these schemes and all nodes. This is doable, still messy.
I was thinking to rewrite cacheops with lua scripting, this will allow me to trash in memory cache of schemes, and that will make sharding trivial. My very initial hacks are in scripting branch.
I think I stand closely to solving this problem. Please check my approach.
https://github.com/Suor/django-cacheops/pull/68
Given production-ready redis cluster and not using transactions but scripting now, this is relatively easy to implement. I don't need this functionality, but if anyone is willing to make a pull request I'll work with him/her on it.
Here are my preliminary designs. I see 2 approaches.
Shard data logically. By model or some condition, e.g. user id, geographic region or whatever. This is especially convinient when you already shard your database. In this scenario each of redis clusters 16384 shards acts as independent cacheops database - each key goes to same shard with its invalidation set and schemes. This way we invalidate in single place with one request.
This could be facilitated by use of hash tags - parts of keys in {}. User will need to supply a hash tag function:
def hash_tag_func(model, attname, value):
# Shard these by model
if model in (City, Country, Category):
return model.__name__
# Shard users by id
elif model is User and attname == 'id':
return 'u%d' % value
# Shard the rest by user
else:
user_fk = first(f.attname for f in model._meta.fields
if f.rel and f.rel.to is User)
assert user_fk, "Need foreign key to User to shard %s" % model.__name__
if field == user_fk:
return 'u%d' % value
Whatever hash_tag_func() returns we prepend to all keys supplied to redis commands. This way they will be assigned to same shard and thus to same node:
cache_key = '{u42}q:G7A...'
conj_key = '{u42}conj:...'
...
Hash tag function is called each time we make a cache fetch, save or invalidation. When we have dnf tree (or queryset from which that is derivable) we pass all elementary conditions from that to this function, in invalidate_obj() we pass all the (key, value) pairs from object dict. Several results of hash tag function for single op should be consistent and defining (at least one non-None). This way all cache operations will be performed against single redis cluster node. So it's infinitely scalable approach.
Shard by full cache key, invalidate everywhere. It's actually as simple as it sounds. When we cache_thing() all the invalidation data is stored on the same node as cache, so if we later run invalidate.lua against all master nodes everything will work. Invalidation against several nodes could be run in parallel with either threads or async IO, but simple for loop will also work for small number of nodes. Additionally invalidate_obj|dict() calls could be grouped and pipelined.
Both approaches could also be used with client side sharding or twemproxy, so redis cluster is not strictly needed. Also, they are not mutually exclusive, eventually both could be implemented.
@botondus, @ttyS15 are you still interested?
We are using redis cluster together with redis-py-cluster in the rest of our application, so I think I could integrate that into django-cacheops.
At very first attempt, changing redis.StrictRedis to rediscluster.StrictRedisCluster produces the following errors:
Upon read, it complaints about accessing accessing keys from a different node, in the cache_thing function:
ResponseError: Error running script (call to f_c54d7c8aeb17b974055d5eb1f1a08f0dafcaaf81): @user_script:37: @user_script: 37: Lua script attempted to access a non local key in a cluster node
Repeating the read 3 (number of master nodes in the cluster) times in eventually succeeds.
Upon invalidation, (e.g. when saving an model), in the invalidate_dict function:
RedisClusterException: EVALSHA - all keys must map to the same key slot
Any suggestions on how to approach this are appreciated.
@anac0nda you need to decide how you shard. Cacheops has keys depending on each other so you can't just distribute them by generic hash key as redis cluster does. See my comment with hash_tag_func() above.
Note that to use second approach from there you need to disable redis cluster check if key belongs to node. Not sure whether this is possible, you may need to use just and a bunch of nodes instead of cluster.
A note: we have 2 approaches here hash tag function vs router function. A second one also solves master/slave.
I went to an approach similar to the second one (shard by full cache key), since in my scenario there is no suitable way to logically shard (load is not distributed evenly amongst users or models).
Before caching with cache_thing.lua, I prepend hash tag to the key that is guaranteed to go to the same cluster node where the key itself would go if it were hashed (e.g. {3}, {2}, {0} in my 3-node setup). The has tag is derived from a lookup table, generated upon initialization, and has exactly one hash tag per node.
So, let's say key foo gets converted to {<tag>}foo before passed to cache_thing.lua. Inside the script, I use the hash tag from the key in order to generate {<tag>}schemes:* amnd {<tag>}conj:* keys for the SADD calls, which are guaranteed to go to the same node as the main key.
For the invalidation phase, the models are cleared from all nodes with the following updated wildcard:
conjs_keys = redis_client.keys('{*}conj:%s:*' % model._meta.db_table)
I also added a loop in the invalidate_dict function that iterates the hash tags for all the nodes and invokes invalidate.lua against each node.
I did some basic testing (with 2 types of clusters: 3 masters only and 3 master/3 slaves) with our applications and it seems to be working correctly. I will create a PR when I run the tests suite. Meanwhile, please have a look at the attached patch file. Comments on the approach are welcome!
Looked through the patch. Working approach overall, I see 2 major issues though:
- Non-cluster usage is completely broken in it, since
SafeRedisandStrictRedisdon't have additional methods ofCacheopsRedisCluster. - If list of nodes changes on the run then
_local_hash_key_cachewill become stale and everything will break or will just never use new nodes.
I successfully ran the tests, both against cluster and a single redis instance.
Points you mention are now addressed in the new patch.
- There are checks to see if running in cluster/non-cluster mode, so both cases should work.
- There's a check to see if redis-py-cluster nodes dict is changed before adding the hash tag. Local cache is rebuild if needed. redis-py-cluster replaces the dictionary with a new one containing fresh information in case of communication exceptions, so I think that should be sufficient.
Let me know what you think.
Nodes change check as written probably won't work. Here are my considerations:
id()of mutable object doesn't change when object is changed, so there is no point in comparing them.self.connection_pool.nodes.nodesis not really documented, so its behaviour undefined. I suggest contactingredisclusterauthor for best way to react on nodes set changes.- It is probably a bug that
_local_hash_key_cacheis not emptied before rebuilding. - Not obvious what happens when new node is added. Potentially all cache is lost cause keys get different nodes.
Hi @suor and @anac0nda. I got an email a day ago from @anac0nda i think it was asking for some input into this question. I answered his email but i would like to make a shorter information dump here on how redis cluster works.
Redis cluster is based around a hash slots system (~16000 slots). Each key that you send to a cluster gets hashed with CRC16 algorithm with modulus on slot size. Each command then puts their data into a bucket. For example if you run SET foo bar it will hash foo to slot 12345 (example) and that slot belongs to one node in the cluster. The idea with all cluster clients that supports redis cluster is abstract away all the node handling and node routing and create a client that works almost the exact same way as redis-py works. The goal for my client is to create a drop-in-replacement if you already have a redis-py client implementation that works. Ofc there is some limitations to this in for example pipelines but it should break as little as possible.
Regarding how nodes work and is handled. When a cluster node event happens different things happen with the cluster based on what happens. If a node is added if you want to expand your capacity in the cluster, you migrate some of the hash slots from the existing nodes to the new node in the cluster. The client and redis itself can already handle this case where it will still server data fetching operations from the old hash slot location during the migration, and all writers will be set for the new one. The transition is seamless and basically the only thing you will notice in the client layer is a slowdown in how many operations each client can handle because there is overhead in dealing with the error handling and new routing.
I would argue that there really is never a use case for a user of any redis client to actually base anything around how the cluster really looks. The only use-case i have figured out until now is if you want to monitor cluster changes with some monitoring solution or send notifications in case the cluster does some fail over in case a master node goes down. But i have never seen any reason to add a feature where a client can track node changes and react based on those changes.
One other thing is that i have looked through the patch that was submitted but i can't really figure out why you really need this node tracking in the client layer. If you want to, you can enlighten me why this is really needed within this context.
If you want to solve the problem with sending all keys that belongs to say an instance of a model object to the same node, you should really look at how regular SQL works with their auto incr on the primary key and use that ID inside the {<model-name>-<id>}foo = bar and use a integer value tracked in a seperate key and just use INCR on that key to get unique ID:s. This will ensure that keys gets routed to the same node and it will also enable you to query all keys for a specific instance in a pipeline to speed up the data fetching.
If this is not the case, please point me in the right direction of the problem that this node tracking is trying to solve :)
@Grokzen the issue with using redis cluster for cacheops is that doesn't fit this whole lot of hashslots model. The reason for this is that cacheops uses one keys refering to other keys: sets with key names. Each such set references some event and keys in a set is cache keys of queries dependent on this event. If all queries in an app can be separated into some independent shards when we could use {shard-name} as you suggested, but in general scenario this is not possible.
The solution in a patch by @anac0nda is to make all keys on single node to have same {} prefix:
node1 - {0}*
node2 - {1}*
node3 - {3}*
The prefixes are calculated by trial and error here. As you can see when new node is added it won't be used unless node -> prefix table is rebuilt. However, if you rebuild it you may encounter any of the following:
- All cache is lost cause all prefixes have changed.
- Cluster needs to move all data from some node to another.
- Some data becomes inaccessible cause its prefix is not used anymore.
The exact outcome depends on how slots are redistributed. This theretically could be resolved by binding keys to hash slots not nodes, but that would be impractical. The thing is if you shard by cache key hash then you need to perform invalidation procedure on each shard, which is too much. Also theoretically you can shard not to all hash slots but to some amount of them, say n * (n+1) having n nodes. This way you can get advantage of redis cluster behaviour instead of fighting it, practicallity should be tested though.
Another approach that probably makes more sense is using Just a Bunch Of Redises + Client Side Sharding + (optionally) Sentinel. This way you won't need to use any prefixes but just send each request to appropriate redis. You will still need to run invalidation procedure on all master nodes.
Do you currently have a unique ID generation strategy that is in place right now if i would to use only 1 server? If i for example want to cache a query-set object, how do it know that the next time i run the code it will fetch what object from redis? I do not have that much time to dig through all code and understand it all so it would be great if i could be pointed tot he correct place :)
@Grokzen its queryset._cache_key(). Can't see how this is relevant though.
@Suor what about integrate cachops with https://github.com/zhihu/redis-shard ?
@genxstylez sharding lib is a trivial part of making cacheops multi-server, the main part is choosing sharding or routing strategy. See above.
Happy to be the champion here.
Anything moved here?
Implementing cache key prefix was a step towards it. The idea is to use {prefix}es with braces and redis-py-cluster for sharding.
Setting everything up turned up to be bigger than I first expected. So I will appreciate any help especially with setting up test env for this.
I see. Thank you.
Any update on this? Opened since 2013 😱
People often implement some custom solution, which is hardly generalizable, or just don't have time to clean it up and contribute. This exists and open for discussion and reference, there is no active work now.
Hi, I come across this problem and would love to have my own simple patch to make this work. I read and try debug here and there to make redis-py-cluster to work with cacheops. Unfortunately all the sample snippets above is gone so I have to invent my own. So what I'm been doing here is to create a simple class to see where the problem is
from cacheops.redis import CacheopsRedis
from cacheops.redis import redis_client, handle_connection_failure
from rediscluster import StrictRedisCluster
class StrictRedisCluster1(StrictRedisCluster):
def _determine_slot(self, *args):
print(args) # debugging
return super(StrictRedisCluster1, self)._determine_slot(*args)
# this class is used in django setting
class CacheopsRedisCluster(StrictRedisCluster1, CacheopsRedis):
get = handle_connection_failure(StrictRedisCluster.get)
I understand this would not work. The problem after debugging is somewhere where the redis cluster try to check hashslot log file here
I already set the prefix {pp_data_io} with hope getting them cache as same server.
After try to run sequentially the code, I figure out the precall_key is set as '' wheras if I manually change it into {pp_data_io} then all of them is hashed into same slot.
The precall_key is not public else where so that i cant do it in the extended class way. And this key is used to invalidate cache relate task (which I haven't read into that yet)
I know that I dont have enough experience to make this work out of the box. I'm looking for a patch version that can run on my redis cluster (regardless optimization)
I'm still reading the codebase, I hope you can point me into some direction to achive that @Suor. Thank a lot
precall_key set to '' means do not use it, unfortunately this doesn't play nice with redis cluster semantics. The solution could be on of:
- generate two versions of
cache_thingscript, one with two keys - prefix and key, the other one with all three - simply pass
precall_keyas a param not a key
Hello again guys, so I manage to create a small patch to make cacheops work with redis cluster. Its not clean but might help someone in needed.
So I ask your permission to share it here Suor:
I have a folder compat_cacheops.zip to override some of cacheops method. A custom cacheops redis class and a redis-cluster init parameters for the django settings file https://gist.github.com/AndyHoang/954942e4603c85bc3e18878c0bcbdd7c I would love too have your feedback
Hello @AndyHoang, thanks for your example. I think those things might be incorporated as long as it still works for single redis scenario.
Hello,
Apparently redis-py have added RedisCluster support to version v4.1.0rc1: https://github.com/redis/redis-py/releases/tag/v4.1.0rc1
Hopefully it will now be easier to also support this with cacheops in a future version...