KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

[BUG] Forkless BGSAVE is super slow

Open msg7086 opened this issue 2 years ago • 2 comments

Describe the bug

I just upgraded to latest keydb 6.3.1 from 6.0.x and I noticed high CPU usage every minute or so. In my scenario the process will simply run 100% for about 40-60 seconds while slowly responding to requests (like 5 req per minute speed).

Suspecting that it has anything to do with save (because default save time is 60 seconds on a busy server), I tried to run SAVE when I saw the high CPU usage. I got the message saying a background save was already in progress.

Once the CPU usage drops, I tried to run SAVE and it took 0.5s to finish. The rdb file is about 40MB. Since SAVE is sync save, I also tried to run BGSAVE, and this time it takes a good minute to finish.

Examining the issues section I noticed that someone mentioned forkless BGSAVE in the latest versions, and the config use-fork. I set use-fork yes in the config and restarted keydb. Now both SAVE and BGSAVE takes less than a second to finish.

To reproduce

Get a busy keydb server and observe CPU usage on BGSAVE.

In my scenario I frequently do RPUSH to one list, at about 500 req/s speed -- I don't know if this matters, just in case you'd like to know.

Expected behavior

Forkless BGSAVE is as fast as SAVE or forked BGSAVE.

Additional information

connected_masters:2
connected_slaves:2
db0:keys=126708,expires=126707,avg_ttl=13449378208539,cached_keys=126708

msg7086 avatar Jan 14 '23 09:01 msg7086

Hi @msg7086 can you specify a few more details about your use case, we don't see this issue normally so it is likely specific to your use case. What are your configs, are there other operations that are running besides the RPUSH, how large is the list?

msotheeswaran-sc avatar Jan 17 '23 18:01 msotheeswaran-sc

Master-master replication.

One node is doing RPUSH of string(100) to a list at 500 req/s, another node is doing LRANGE and LTRIM every 10-20 seconds to process the list in batches. The LLEN of the key is bouncing around 0-5000.

Other than that, there's about 100000-150000 small KV pairs storing short strings, less than 250 bytes, static-ish cached data. They didn't change when SAVE/BGSAVE was fired.

RDB file on disk is about 40-50MB total.

BGSAVE speed validated on both node, both show high CPU usage and slow speed at forkless saving.

Default debian installation with minimal config changes (multi-master, replica-of, etc).

msg7086 avatar Jan 17 '23 20:01 msg7086