rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

[ISSUE #8765] fix low performance of delay message when enable rocksdb consume queue

Open yuz10 opened this issue 1 year ago • 4 comments

Which Issue(s) This PR Fixes

Fixes #8765

Brief Description

How Did You Test This Change?

yuz10 avatar Sep 27 '24 09:09 yuz10

Codecov Report

Attention: Patch coverage is 53.57143% with 13 lines in your changes missing coverage. Please review.

Project coverage is 47.35%. Comparing base (daf3d1a) to head (8bd91e3). Report is 86 commits behind head on develop.

Files with missing lines Patch % Lines
...ache/rocketmq/store/queue/RocksDBConsumeQueue.java 53.57% 10 Missing and 3 partials :warning:
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #8766      +/-   ##
=============================================
- Coverage      47.52%   47.35%   -0.17%     
+ Complexity     11592    11560      -32     
=============================================
  Files           1282     1282              
  Lines          89848    89882      +34     
  Branches       11557    11565       +8     
=============================================
- Hits           42697    42567     -130     
- Misses         41927    42056     +129     
- Partials        5224     5259      +35     
Flag Coverage Δ
47.35% <53.57%> (-0.17%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Sep 27 '24 09:09 codecov-commenter

@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?

lizhanhui avatar Sep 29 '24 07:09 lizhanhui

@yuz10 Is there profiling metrics verifying that prefetch actually improves perf?

The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp

yuz10 avatar Sep 29 '24 07:09 yuz10

@lizhanhui I found no difference between batch and single get key from rocksdb. I will remove prefetch code. Batch: QueryCQ iter 10489877 cost 20527 QueryCQ iter 10489877 cost 19496 QueryCQ iter 10489877 cost 19395

Single: QueryCQ iter 10489877 cost 20313 QueryCQ iter 10489877 cost 19196 QueryCQ iter 10489877 cost 18945

yuz10 avatar Sep 29 '24 09:09 yuz10

@yuz10 Got your update and review it tomorrow.

lizhanhui avatar Oct 23 '24 14:10 lizhanhui

The performance loss is not related to prefetching. The schedule message deliver speed is 160/s because every time the iterator only returns 16 messages, and the deliver thread will sleep 100ms after iterate finish. See org.apache.rocketmq.broker.schedule.ScheduleMessageService.DeliverDelayedMessageTimerTask#executeOnTimeUp

  1. The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
  2. You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
  3. A third option is to directly wrap RocksIterator with prefix;

It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ... After all pros and cons are clarified, we may finalize this pull request.

Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.

lizhanhui avatar Oct 24 '24 02:10 lizhanhui

  1. The original implementation uses one-shot(at most 16 results) multi-get to simulate iterator; The outcome iterator fails to return all results, thus, does not fit well for the mentioned use case;
  2. You change is to use lazy single get to iterate; and use potential pre-fetch to accelerate;
  3. A third option is to directly wrap RocksIterator with prefix;

It would be best to make further comparisons in terms of performance(why multi-get at present), code maintenance, ... After all pros and cons are clarified, we may finalize this pull request.

Another issue is option 2, aka, this pull request, changes original behavior. We need to verify the change does not impact semantics of upper layer code bases.

I did not compare the performance of RocksIterator with current solution, It can be optimized later, the current solution just deals with the issue of delay message. Another solustion is not to sleep 100ms after each iteration. As for the behavior, the default ConsumeQueue only iters one file, the RocksDBConsumeQueue only iters at most 16 items. so I think the behavior is not defined about how many items the iteration returns. and will not impact upper layer code.

yuz10 avatar Oct 28 '24 06:10 yuz10

@lizhanhui I found no difference between batch and single get key from rocksdb. I will remove prefetch code. Batch: QueryCQ iter 10489877 cost 20527 QueryCQ iter 10489877 cost 19496 QueryCQ iter 10489877 cost 19395

Single: QueryCQ iter 10489877 cost 20313 QueryCQ iter 10489877 cost 19196 QueryCQ iter 10489877 cost 18945

Take a look at this blog, MultiGet has quite a few optimizations available, we shall have two options to investigate: 1, Simulated manual iterator with multi-get in the background, to prefetch accordingly; 2, Use RocksDB embedded Iterator; https://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html

Provided benchmark may be too simple, for example, dataset is not large enough that SSTs are all cached in memory and over simplified benchmark may not tell the difference.

lizhanhui avatar Nov 20 '24 08:11 lizhanhui