rocksdb Mempurge instead of flush is initiated even if only one memtable is picked by flush job

Mempurge instead of flush is initiated even if only one memtable is picked by flush job

Open riversand963 opened this issue 3 years ago • 4 comments

This can be a potential bug once we merge #9142, which is a fix for a legit bug causing DB::Open failure. Currently before the fix, this bug is hidden.

In https://github.com/facebook/rocksdb/blob/6.26.fb/db/flush_job.cc#L233, a flush job will initiate a mempurge instead of flush even if mems_.size() is 1. Consequently, this flush job does not reduce the number of immutable memtables, leading to higher chance of write stall.

Expected behavior

When the number of immutable memtables reaches threshold, a flush is scheduled and executed, resulting in reduced number of immutable memtables. The db will eventually get out of write-stall, even when there are a lot of writes.

Actual behavior

Currently, when the number of immutable reaches threshold, a mempurge may be scheduled even if the number of memtables picked is 1. The new memtable will be added back, and does not mitigate write-stall condition. No further flush may be scheduled because normally a flush is scheduled after insertion, but insertion is currently stalled.

Steps to reproduce the behavior

Use #9150 , restart the job "build-linux-non-shm-1" with ssh access. Manually run the following

./db_flush_test --gtest_filter=DBFlushTest.MemPurgeWALSupport

It will hang.

Nov 10 '21 19:11 riversand963

Should it be assigned or up-for-grabs?

Nov 18 '21 23:11 ajkr

@ajkr I'm happy to get this assigned to me

Aug 02 '22 16:08 briantkim93

@ajkr With regards to this bug, I have 2 questions:

Why are we limiting the mempurge output to only one memtable ?
I tried following the code and did not see where the old memtables are destroyed after the new memtable is created

Aug 09 '22 02:08 briantkim93

Sorry I'm not currently familiar with mempurge. @riversand963 are you able to help answer the questions?

Aug 11 '22 07:08 ajkr

rocksdb rocksdb copied to clipboard

Mempurge instead of flush is initiated even if only one memtable is picked by flush job

Expected behavior

Actual behavior

Steps to reproduce the behavior

rocksdb
rocksdb copied to clipboard