dragonfly
dragonfly copied to clipboard
implement 3.0 api
if we look at commands.json
jq -c 'to_entries[] | select(.value.since | test("^3\\..*"))' commands.json | grep -v '"cluster"' | grep -v '"geo"' | less
we will see that in besides some debugging/configuration/bitops commands we have only TOUCH command that needs to be implemented. Dragonfly implements its cache a bit different from Redis but it still has notion of "upgrading" the entry to a higher priority.
It could be that in dragonfly, touch and exist are almost identical. need to see if there are any differences in the output or actual behavior.
update: we need to finish 2.x commands, bits commands, watch/unwatch commands.
Lets start with watch & unwatch
How Redis does it
The whole management is really simple:
struct watchedKeystores basic info (key, db, client)- WATCH adds a
watchedKeyto the clients watch listc->watched_keysand a global tablec->db->watched_keys, which stores a list of dependant clients for every key - When a key is updated, all clients in
c->db->watched_keys[key]are marked asCLIENT_DIRTY_CAS, so the following EVAL aborts db->watched_keysis just a regular redis dict, the same that is used for the main storate/expires
Some more thoughts
- Instead of having another map, we can store a incremental version code for each key. The drawback here is that handling expires/deletes is a bit tricky. Besides, redis abort checks are constant O(1) and after the first watched key is updated, all further watches are ignored.
db->watched_keysdict is queried on each update, so 99% are just reads, 99% of which probably return nothing- The watch mechanism is in many ways similar to expiration. So I expect a viable solution to be similar to it (which is in turn similar to the redis approach). However I feel like adding a lot of unwanted complexity and overhead just for a minor feature.
- This would mean adding another DashTable likewise to ExpireTable and updating it on writes.
What do you think?
I do think it's a minor feature. You can see what we currently store inside a db, see DbTable class for that.
Specifically, we maintain trans_locks that we use to track the ongoing transaction locks. In fact, check in DbSlice for CheckLock , Acquire and Release that you could probably use to piggy-back and store some information about watched keys.
The only complexity I see is that storing connections directly inside db->watched_keys is complicated. It's doable (I do it in pubsub I think) but it's complicated. The reason for this is that DbTable can reside in a different thread and then if you store a reference to an object whose life is controlled by another thread, your life becomes complicated.
The simpler solution will be is that WATCH will indeed register some keys for tracking in the appropriate DbTables. And then during EVAL we will do an additional transactional "hop" that queries DbTables regarding the state of those keys.
also look at ConnectionState in server/conn_context.h - it contain transaction related info with ExecState and exec_body variables. I feel it's about time to group those within a dedicated substruct similarly to SubscribeInfo struct. You will add the watched keys into the new struct.
Thinking about it a bit more - you will have to clean up DbTable's watched table upon connection closure - i.e. there will be some interaction between a connection lifetime and DbTable's state. So you can still store a pointer to connections inside a DbTable but you can only reference there thread-safe structures or, alternatively, just use it as a descriptor without dereferencing it.
And then during EVAL we will do an additional transactional "hop" that queries DbTables regarding the state of those keys
The purpose of DbTable's watched table is that EXEC doesn't have to query its own keys. However, it still has to unregister all its keys (it does on the first watch fail). What is more, EXEC directly checks for expirations of all of them 🤔 So in the end it has to query all keys at least once. It seems like with a few watched keys, the db->watch_keys has more overhead than speedup. Do you see the real reason they're using it? Solely for the fail-fast behavior? Redis already stores a last-modified timestamp (OBJECT IDLETIME) so memory savings are probably not the issue
you can only reference there thread-safe structures
Sure, I thought of adding some atomic flag or making exec_state atomic. If we need it at all (see question above)
Lets forget about fast-fail, and complexity of going over all the keys right now. Assume that the watched list is short.
Are you saying that we can store all the info in connection and get done with it?
what will you store in the map? the dirty flag? i.e. key->bool mapping? key->unsigned mapping for versioning?
how another shard will know to update those keys in those connections?
how another shard will know to update those keys in those connections?
There is obviously no way for it
key->unsigned mapping for versioning
Yes. If we use simple version codes, we might miss a DEL-SET sequence. Instead, the last modified time with nanosecond granularity seems like an option. Then, on EXEC, we'll just check the time didn't change (so the key has no newer timestamp)
Besides, Redis doesn't even bother to use a map for the connections data. Watch checks are O(k).
Or is there some issue I'm not seeing through?
So thread T1 holds a connection C that has a watchmap with key K belonging to a shard in thread T2.
Now a SET operation changes K in T2. How does it know to search for C and update K in c->watched?
Forget multi-threading - how a single-threaded system will know to search for C ?
There is no double-sided dependency. The key just cares for itself, but it stores a timestamp with its own last update time (we'll have to add it at some point to support OBJECT IDLETIME).
WATCH just stores the timestamp TK of some key K at the time of call in C, meaning "I still expect K to have timestamp TK when I start EXEC". If TK changed since WATCH, then the key was modified. So the values in C don't change - they serve as kind of a snapshot to compare against on EXEC
So you suggest storing the last modified timestamp for every entry in DF inside DbTable ? or just for watched keys inside DbTable?
Yes, for all. We can't store them selectively without query overhead
So my problem with this is that for a pretty minor feature we need to add space overhead of 8 bytes per key. Even for expiry (which is much more common) we complicated things so that the memory usage with be incremental with the feature. Please note that memory usage is the primary cost factor when using a memory store.
Therefore, I think trading implementation complexity for reducing the overhead in memory is well worth it. Regarding the OBJECT IDLETIME command - it's a debugging command, and we won't support it.
Oh, okay. That answers all of my questions above. If Dragonfly doesn't intend to support IDLETIME, then adding memory overhead just for WATCH isn't worth it.
Closing as done. Will open a dedicated task for TOUCH