rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

CancelAllBackgroundWork: Flush of unpersisted data waits for stall conditions to clear and delays db close

Open Yuval-Ariel opened this issue 1 year ago • 2 comments

Hey RocksDB team,

I have a small question regarding the behavior when closing the db. Currently, when CancelAllBackgroundWork is called, a flush is initiated if there is any unpersisted data - here https://github.com/facebook/rocksdb/blob/acf77e1bfee9ebb0867ac277927ed2e37276c493/db/db_impl/db_impl.cc#L496 The above flush request will wait until certain stall conditions are cleared - in detail here: https://github.com/facebook/rocksdb/blob/acf77e1bfee9ebb0867ac277927ed2e37276c493/db/db_impl/db_impl_compaction_flush.cc#L2617
Since the default value for allow_write_stall in FlushOptions is false. This means the DB Close() will have to wait for those stall conditions to clear.

I find this behaviour unexpected since the user is sometimes unaware of this connection and the call to close the db might take much longer than expected. Is there a reason why we should not pass allow_write_stall = true ?

CancelAllBackgroundWork(true) can be passed if waiting for bg work is needed and the problem as i see it currently is that even when calling CancelAllBackgroundWork(false), the call will be delayed in cases of unpersisted data and a stall condition.

Thanks

Yuval-Ariel avatar Feb 12 '24 12:02 Yuval-Ariel

It's most likely because allow_write_stall in FlushOptions was added later (in 2018). The usage of default FlushOptions here existed before 2016.

I personally think it should be okay to proceed with the Flush(), waiting for stall conditions if users want. The next Open() will then stall for writes early. Maybe it can be configurable parameter passing in CancelAllBackgroundWork()

hx235 avatar Feb 27 '24 02:02 hx235

Sure i can do that. Im just worried that this new parameter will be too confusing since it means "flush will wait for some stall conditions to clear if there is unpersisted data". This new parameter will have no effect if theres no unpersisted data. Also, this might conflict with the already existing parameter of CancelAllBackgroundWork() since a user can pass wait = false and allow_write_stall = false which translates into "dont wait for bg work but wait for stall conditions to clear" (meaning it will wait for some bg work even though wait is false). Last point is that we'd always want to flush unpersisted data as fast as possible right? @hx235 what do you think?

Yuval-Ariel avatar Feb 28 '24 08:02 Yuval-Ariel