[ISSUE #8765] fix low performance of delay message when enable rocksdb consume queue
Which Issue(s) This PR Fixes
Fixes #8765
Brief Description
How Did You Test This Change?
Codecov Report
Attention: Patch coverage is 53.57143% with 13 lines in your changes missing coverage. Please review.
Project coverage is 47.35%. Comparing base (
daf3d1a) to head (8bd91e3). Report is 86 commits behind head on develop.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...ache/rocketmq/store/queue/RocksDBConsumeQueue.java | 53.57% | 10 Missing and 3 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #8766 +/- ##
=============================================
- Coverage 47.52% 47.35% -0.17%
+ Complexity 11592 11560 -32
=============================================
Files 1282 1282
Lines 89848 89882 +34
Branches 11557 11565 +8
=============================================
- Hits 42697 42567 -130
- Misses 41927 42056 +129
- Partials 5224 5259 +35
| Flag | Coverage Δ | |
|---|---|---|
47.35% <53.57%> (-0.17%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?
@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?
The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp
@lizhanhui I found no difference between batch and single get key from rocksdb. I will remove prefetch code. Batch: QueryCQ iter 10489877 cost 20527 QueryCQ iter 10489877 cost 19496 QueryCQ iter 10489877 cost 19395
Single: QueryCQ iter 10489877 cost 20313 QueryCQ iter 10489877 cost 19196 QueryCQ iter 10489877 cost 18945
@yuz10 Got your update and review it tomorrow.
The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp
- The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
- You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
- A third option is to directly wrap RocksIterator with prefix;
It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ... After all pros and cons are clarified, we may finalize this pull request.
Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.
- The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
- You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
- A third option is to directly wrap RocksIterator with prefix;
It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ... After all pros and cons are clarified, we may finalize this pull request.
Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.
I did not compare the performance of RocksIterator with current solution, It can be optimized later, the current solution just deals with the issue of delay message. Another solustion is not to sleep 100ms after each iteration. As for the behavior, the default ConsumeQueue only iters one file, the RocksDBConsumeQueue only iters at most 16 items. so I think the behavior is not defined about how many items the iteration returns. and will not impact upper layer code.
@lizhanhui I found no difference between batch and single get key from rocksdb. I will remove prefetch code. Batch: QueryCQ iter 10489877 cost 20527 QueryCQ iter 10489877 cost 19496 QueryCQ iter 10489877 cost 19395
Single: QueryCQ iter 10489877 cost 20313 QueryCQ iter 10489877 cost 19196 QueryCQ iter 10489877 cost 18945
Take a look at this blog, MultiGet has quite a few optimizations available, we shall have two options to investigate: 1, Simulated manual iterator with multi-get in the background, to prefetch accordingly; 2, Use RocksDB embedded Iterator; https://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html
Provided benchmark may be too simple, for example, dataset is not large enough that SSTs are all cached in memory and over simplified benchmark may not tell the difference.