valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[Improvement][Cluster Mode] Remove Unowned Keys After Loading Persistence Files At Server Startup

Open singku opened this issue 1 year ago • 2 comments
trafficstars

Describe the Issue While the server starts with cluster mode and loading persistence files, keys may not be served by this server due to slot coverage of multiple shards. This happens when:

  1. slot migration occurred after persistence files generation, server restarts before new persistence files generated and loads the outdated persistence files.
  2. server startups using pre-generated persistence files for cache warmup, boot up new cluster using existing data, etc. In this case, slot distributions among multiple shards might be different with persistence files.

In above cases, key belongs to a slot might not be served by a server, but that key will be added into this server in current code path. This issue leads to orphaned keys that leaks memory.

We can add a deletion call to clean up unowned keys after the loading, but before the server servers requests.

I can make a PR for this if this issue makes sense.

singku avatar May 23 '24 17:05 singku

This makes sense to me. We talked about fixing this in the past but never did anything about it, so I think a contribution is welcome since this is practically a bugfix.

madolson avatar May 23 '24 17:05 madolson

Sounds good, I can make a PR soon.

singku avatar May 23 '24 18:05 singku

With the latest RDB file, we have the slot op code, we can use that to skip through the slot/keys which isn't owned by the node. This would ideally be faster.

hpatro avatar May 24 '24 16:05 hpatro

Wouldn't this cause some inconsistency if some keys of the rdb file is loaded and some keys are discarded? For a cache it may be fine but for a database it may be a problem? Or do we care about consistency only per slot?

zuiderkwast avatar May 24 '24 22:05 zuiderkwast

Wouldn't this cause some inconsistency if some keys of the rdb file is loaded and some keys are discarded? For a cache it may be fine but for a database it may be a problem? Or do we care about consistency only per slot?

The problem is we end up with keys today that are fully inaccessible. If you wedge a key into a slot you do not own, you can't coherently access it or delete. it.

Or do we care about consistency only per slot?

That has been our historic criteria, we are only consistent within a slot. We could consider changing it.

madolson avatar May 24 '24 23:05 madolson

Fine, I like the idea then.

(I've been annoyed by DEBUG POPULATE too, that it adds keys in non-owned slots. I wish we can fix that too in some other PR

zuiderkwast avatar May 24 '24 23:05 zuiderkwast