bookkeeper
bookkeeper copied to clipboard
add async or sync for :entry location index's rocksdb write
Descriptions of the changes in this PR:
Motivation
bookie timed to flush latency mainly composed of three pieces: 1. flush-entrylog: it's the latency for flushing entrylog 2. flush-locations-index: it's the latency for flushing entrly location index, use sync mode to flush 3. flush-ledger-index: it's the latency for flushing ledger metadata index, this index(LedgerMetadataIndex) use async mode to flush
reduce entry location index flush latency to reduce bookie's flush latency, so add async mode for entry location index's write: 1. default sync mode: not change the original logic 2. async mode : need update config to open, this mode is to speed up writing, this mode is the same as bookie's another index: LedgerMetadataIndex
sync is different from async:
-
sync mode:
1. create a batch; 4. add msg to the batch 5. call method(batch.flush) to flush the batch -
sync mode:
1. just call method(locationsDb.sync) to write the data 2. the rocksdb server will be timed to flush the data
Changes
- add async or sync for :location rocksdb write
- add switch to open this feature, default not open, use default sync
@dlg99 @eolivelli @pkumar-singh @zymap @hangc0276 @lordcheng10 @merlimat If you have time, please help me review it, thank you.
@hangc0276 @merlimat please take a look
You are on your way! I left a suggestion for the configuration entry.
We must also add comprehensive tests
ok,thanks for reviewed, I will add testcases for this function
You are on your way! I left a suggestion for the configuration entry.
We must also add comprehensive tests
yeah, I have added two testcase: EntryLocationIndexAsyncTest/EntryLocationIndexSyncTest @eolivelli
I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?
I'm not sure I understand why we need to have 2 different modes here. Do both methods provide the same guarantees or not?
In the case of multiple replications, when the user pursues the writing speed and does not need to guarantee the success rate of each replication, the asynchronous writing method like LedgerMetadataIndex gives the user more choices @merlimat
I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:
- Journal
- Entry log flush (where all the data has to be put on disk)
If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).
I still cannot see a case where flushing the entry location index (ledgerId, entryId, offset) becomes the bottleneck, compared to:
- Journal
- Entry log flush (where all the data has to be put on disk)
If it is not the bottleneck, then why reduce the guarantees? (eg: if we miss the async update, we have effectively lost data in the bookie).
yes, in the step : entrylog flush In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime. Like LedgerMetadataIndex, it also uses an asynchronous method to write rocksdb @merlimat
In order to speed up the flush, we only need to ensure that the EntryLocationIndex is successfully written under normal circumstances, regardless of machine downtime.
I'm not sure I follow here.
My point is that if flushing entry logs takes 90% and RocksDB takes 10% of time (I made up these number here), then adding the risk of losing data to shave time on the 10% portion doesn't make much sense.
ok,I understand what you mean, let me collect the time and proportion of the three parts of flush @merlimat
rerun failure checks