bookkeeper add async or sync for :entry location index's rocksdb write

Descriptions of the changes in this PR:

Motivation

bookie timed to flush latency mainly composed of three pieces: 1. flush-entrylog: it's the latency for flushing entrylog 2. flush-locations-index: it's the latency for flushing entrly location index, use sync mode to flush 3. flush-ledger-index: it's the latency for flushing ledger metadata index, this index(LedgerMetadataIndex) use async mode to flush

reduce entry location index flush latency to reduce bookie's flush latency, so add async mode for entry location index's write: 1. default sync mode: not change the original logic 2. async mode : need update config to open, this mode is to speed up writing, this mode is the same as bookie's another index: LedgerMetadataIndex

sync is different from async：

sync mode:
1. create a batch; 4. add msg to the batch 5. call method(batch.flush) to flush the batch
sync mode:
1. just call method(locationsDb.sync) to write the data 2. the rocksdb server will be timed to flush the data

Changes

add async or sync for :location rocksdb write
add switch to open this feature, default not open, use default sync

Mar 13 '22 06:03 StevenLuMT

@dlg99 @eolivelli @pkumar-singh @zymap @hangc0276 @lordcheng10 @merlimat If you have time, please help me review it, thank you.

Mar 13 '22 06:03 StevenLuMT

@hangc0276 @merlimat please take a look

Mar 15 '22 23:03 dlg99

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

ok,thanks for reviewed, I will add testcases for this function

Mar 25 '22 07:03 StevenLuMT

You are on your way! I left a suggestion for the configuration entry.

We must also add comprehensive tests

yeah, I have added two testcase: EntryLocationIndexAsyncTest/EntryLocationIndexSyncTest @eolivelli

Jun 21 '22 08:06 StevenLuMT

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

Jun 22 '22 16:06 merlimat

I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?

In the case of multiple replications, when the user pursues the writing speed and does not need to guarantee the success rate of each replication, the asynchronous writing method like LedgerMetadataIndex gives the user more choices @merlimat

Jun 23 '22 01:06 StevenLuMT

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

Journal
Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

Jun 23 '22 02:06 merlimat

I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:

Journal

Entry log flush (where all the data has to be put on disk)

If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).

yes, in the step : entrylog flush In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime. Like LedgerMetadataIndex, it also uses an asynchronous method to write rocksdb @merlimat

Jun 23 '22 07:06 StevenLuMT

In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.

I'm not sure I follow here.

My point is that if flushing entry logs takes 90% and RocksDB takes 10% of time (I made up these number here), then adding the risk of losing data to shave time on the 10% portion doesn't make much sense.

Jun 23 '22 18:06 merlimat

ok,I understand what you mean, let me collect the time and proportion of the three parts of flush @merlimat

Jun 24 '22 02:06 StevenLuMT

rerun failure checks

Sep 25 '22 10:09 StevenLuMT