rocksdb
rocksdb copied to clipboard
Mempurge instead of flush is initiated even if only one memtable is picked by flush job
This can be a potential bug once we merge #9142, which is a fix for a legit bug causing DB::Open failure. Currently before the fix, this bug is hidden.
In https://github.com/facebook/rocksdb/blob/6.26.fb/db/flush_job.cc#L233, a flush job will initiate a mempurge instead of flush even if mems_.size() is 1. Consequently, this flush job does not reduce the number of immutable memtables, leading to higher chance of write stall.
Expected behavior
When the number of immutable memtables reaches threshold, a flush is scheduled and executed, resulting in reduced number of immutable memtables. The db will eventually get out of write-stall, even when there are a lot of writes.
Actual behavior
Currently, when the number of immutable reaches threshold, a mempurge may be scheduled even if the number of memtables picked is 1. The new memtable will be added back, and does not mitigate write-stall condition. No further flush may be scheduled because normally a flush is scheduled after insertion, but insertion is currently stalled.
Steps to reproduce the behavior
Use #9150 , restart the job "build-linux-non-shm-1" with ssh access. Manually run the following
./db_flush_test --gtest_filter=DBFlushTest.MemPurgeWALSupport
It will hang.
Should it be assigned or up-for-grabs?
@ajkr I'm happy to get this assigned to me
@ajkr With regards to this bug, I have 2 questions:
- Why are we limiting the mempurge output to only one memtable ?
- I tried following the code and did not see where the old memtables are destroyed after the new memtable is created
Sorry I'm not currently familiar with mempurge. @riversand963 are you able to help answer the questions?