dragonfly implement 3.0 api

if we look at commands.json

we will see that in besides some debugging/configuration/bitops commands we have only TOUCH command that needs to be implemented. Dragonfly implements its cache a bit different from Redis but it still has notion of "upgrading" the entry to a higher priority. It could be that in dragonfly, touch and exist are almost identical. need to see if there are any differences in the output or actual behavior.

Jun 04 '22 02:06 romange

update: we need to finish 2.x commands, bits commands, watch/unwatch commands.

Aug 22 '22 12:08 romange

Lets start with watch & unwatch

How Redis does it

The whole management is really simple:

struct watchedKey stores basic info (key, db, client)
WATCH adds a watchedKey to the clients watch list c->watched_keys and a global table c->db->watched_keys, which stores a list of dependant clients for every key
When a key is updated, all clients in c->db->watched_keys[key] are marked as CLIENT_DIRTY_CAS, so the following EVAL aborts
db->watched_keys is just a regular redis dict, the same that is used for the main storate/expires

Some more thoughts

Instead of having another map, we can store a incremental version code for each key. The drawback here is that handling expires/deletes is a bit tricky. Besides, redis abort checks are constant O(1) and after the first watched key is updated, all further watches are ignored.
db->watched_keys dict is queried on each update, so 99% are just reads, 99% of which probably return nothing
The watch mechanism is in many ways similar to expiration. So I expect a viable solution to be similar to it (which is in turn similar to the redis approach). However I feel like adding a lot of unwanted complexity and overhead just for a minor feature.
This would mean adding another DashTable likewise to ExpireTable and updating it on writes.

What do you think?

Sep 08 '22 21:09 dranikpg

I do think it's a minor feature. You can see what we currently store inside a db, see DbTable class for that. Specifically, we maintain trans_locks that we use to track the ongoing transaction locks. In fact, check in DbSlice for CheckLock , Acquire and Release that you could probably use to piggy-back and store some information about watched keys.

The only complexity I see is that storing connections directly inside db->watched_keys is complicated. It's doable (I do it in pubsub I think) but it's complicated. The reason for this is that DbTable can reside in a different thread and then if you store a reference to an object whose life is controlled by another thread, your life becomes complicated.

The simpler solution will be is that WATCH will indeed register some keys for tracking in the appropriate DbTables. And then during EVAL we will do an additional transactional "hop" that queries DbTables regarding the state of those keys. also look at ConnectionState in server/conn_context.h - it contain transaction related info with ExecState and exec_body variables. I feel it's about time to group those within a dedicated substruct similarly to SubscribeInfo struct. You will add the watched keys into the new struct.

Sep 09 '22 05:09 romange

Thinking about it a bit more - you will have to clean up DbTable's watched table upon connection closure - i.e. there will be some interaction between a connection lifetime and DbTable's state. So you can still store a pointer to connections inside a DbTable but you can only reference there thread-safe structures or, alternatively, just use it as a descriptor without dereferencing it.

Sep 09 '22 05:09 romange

And then during EVAL we will do an additional transactional "hop" that queries DbTables regarding the state of those keys

The purpose of DbTable's watched table is that EXEC doesn't have to query its own keys. However, it still has to unregister all its keys (it does on the first watch fail). What is more, EXEC directly checks for expirations of all of them 🤔 So in the end it has to query all keys at least once. It seems like with a few watched keys, the db->watch_keys has more overhead than speedup. Do you see the real reason they're using it? Solely for the fail-fast behavior? Redis already stores a last-modified timestamp (OBJECT IDLETIME) so memory savings are probably not the issue

you can only reference there thread-safe structures

Sure, I thought of adding some atomic flag or making exec_state atomic. If we need it at all (see question above)

Sep 09 '22 07:09 dranikpg

Lets forget about fast-fail, and complexity of going over all the keys right now. Assume that the watched list is short.

Are you saying that we can store all the info in connection and get done with it? what will you store in the map? the dirty flag? i.e. key->bool mapping? key->unsigned mapping for versioning? how another shard will know to update those keys in those connections?

Sep 09 '22 07:09 romange

how another shard will know to update those keys in those connections?

There is obviously no way for it

key->unsigned mapping for versioning

Yes. If we use simple version codes, we might miss a DEL-SET sequence. Instead, the last modified time with nanosecond granularity seems like an option. Then, on EXEC, we'll just check the time didn't change (so the key has no newer timestamp)

Besides, Redis doesn't even bother to use a map for the connections data. Watch checks are O(k).

Or is there some issue I'm not seeing through?

Sep 09 '22 07:09 dranikpg

So thread T1 holds a connection C that has a watchmap with key K belonging to a shard in thread T2. Now a SET operation changes K in T2. How does it know to search for C and update K in c->watched?

Sep 09 '22 07:09 romange

Forget multi-threading - how a single-threaded system will know to search for C ?

Sep 09 '22 07:09 romange

There is no double-sided dependency. The key just cares for itself, but it stores a timestamp with its own last update time (we'll have to add it at some point to support OBJECT IDLETIME).

WATCH just stores the timestamp TK of some key K at the time of call in C, meaning "I still expect K to have timestamp TK when I start EXEC". If TK changed since WATCH, then the key was modified. So the values in C don't change - they serve as kind of a snapshot to compare against on EXEC

Sep 09 '22 07:09 dranikpg

So you suggest storing the last modified timestamp for every entry in DF inside DbTable ? or just for watched keys inside DbTable?

Sep 09 '22 08:09 romange

Yes, for all. We can't store them selectively without query overhead

Sep 09 '22 08:09 dranikpg

So my problem with this is that for a pretty minor feature we need to add space overhead of 8 bytes per key. Even for expiry (which is much more common) we complicated things so that the memory usage with be incremental with the feature. Please note that memory usage is the primary cost factor when using a memory store.

Therefore, I think trading implementation complexity for reducing the overhead in memory is well worth it. Regarding the OBJECT IDLETIME command - it's a debugging command, and we won't support it.

Sep 09 '22 08:09 romange

Oh, okay. That answers all of my questions above. If Dragonfly doesn't intend to support IDLETIME, then adding memory overhead just for WATCH isn't worth it.

Sep 09 '22 09:09 dranikpg

Closing as done. Will open a dedicated task for TOUCH

Oct 19 '22 17:10 romange

dragonfly dragonfly copied to clipboard

implement 3.0 api

How Redis does it

Some more thoughts

dragonfly
dragonfly copied to clipboard