KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

[BUG] Keys command get wrong data randomly sometime

Open agosalvez opened this issue 3 years ago • 3 comments

Describe the bug

The KEYS command works perfectly for me but sometime I've detected wrong data at the moment to get values.

It seems that recover data of other key pattern.

It's very strange but in unlike redis, keydb uses multithreading architecture. It's possible that in a rare moment this command can fails?

To reproduce

It's random and does not depends on code.

Additional information

I use KeyDB (eqalpha/keydb:x86_64_v6.3.1) in a k8s cluster with 3 replicas.

Thank you in advance.

agosalvez avatar Jul 04 '22 15:07 agosalvez

@agosalvez Can you be a little more specific? Is the data missing from a very recent write from a different client? You can try setting max-snapshot-slip to 0 to avoid this, it defaults to 400ms.

In 6.3 the keys command operates on a snapshot and runs in the background. If we have a recent snapshot (up to max-snapshot-slip age) we will use it instead of making a fresh one. This is also means other clients can execute writes while the keys command is running, those will not be returned.

JohnSully avatar Jul 06 '22 03:07 JohnSully

Hi @JohnSully,

We are completely sure this is related to keydb as it started just the day after we moved Redis to Keydb, same code, same everything but Redis. We were using and old Redis release, 3.2, so we can not discard this is a Redis issue instead of keydb.

We have tried some debug and have some info so far. It seems to be related to concurrency as two async process are using the same keydb client instance. This is a nodejs application. And this is an extract when this happens.

Process1: 09:08:46.085 KEYS ccare:362:* '[{"type":1,"schedule":" A VERY LONG JSON STRING, about 2k chars "timezone":"America/Mexico_City"}]'
Asking for “KEYS” returns a STRING, this is the content of one key, but none of those supposed keys.

Process?: 09:08:46.085 ERROR getKey OK Same millisecond. Not sure where this OK comes from.

Process2: 09:08:46.086 ERROR getKey ccare:362:463 One millisecond after, a key name is returned. This keys are NEVER got directly, only by a KEYS command to get the list.

Some milliseconds after, trying to reload the keys, everything is right

Process 1 RETRY: 09:08:46.385 KEYS ccare:362:* [ 'ccare:362:463' ]

So, from my point, my guess is that responses are changed. KEYS response is returned a GET answer whereas the GET answer is returning the KEYS response.

Does this have sense?

agosalvez avatar Jul 06 '22 13:07 agosalvez

@JohnSully I also encountered the similar problem.

Our traffic pattern is like:

  • have 4-5 keys like clicks:userid:groupid with the structure: {clickCounter: int, userId: str, groupid: str}
  • userId and groupid values are set with HMSET.
  • clickCounter increased with HINCRBY.
  • the keys updates frequently (150-300 rps)

There is a Redis cluster, one of KeyDB is also set up to replicate newest data from Redis.

In application we iterate over keys by pattern clicks:* with KEYS clicks:* command.

Periodically the returning keys are changing unexpectedly:

  • ["clicks:user:1233", "clicks:user:1245", "clicks:user2341"]
  • ["clicks:user:1233", "clicks:user:1245", "clicks:user2341"]
  • ["clicks:user:1233", "clicks:user:1245", "clicks:user2341"]
  • ["clicks:user:1233", "clicks:user:1245"]
  • ["24124"]
  • ["clicks:user:1233", "clicks:user:1245", "clicks:user2341"]
  • [""]

Effectively, application uses the following sets of commands:

  • HMSET
  • HINCRBY
  • GETALL
  • KEYS
  • DEL

When application connects to Redis, KeyDB nodes receives all data correctly.

It seems KeyDB has dirty read and dirty writes 1. We also have the same application in UAT environment with KeyDB 3-4 rps, it has no such issues.

Kirkirillka avatar Jul 05 '23 14:07 Kirkirillka