risingwave Bug(compaction): Unable to trigger split in time, when barrier latency is high

Bug(compaction): Unable to trigger split in time, when barrier latency is high

Open Li0k opened this issue 1 year ago • 2 comments

Describe the bug

In Hummock, the decision to split a compaction group is made by counting the flush throughput of the table. https://github.com/risingwavelabs/risingwave/blob/41f4ad55c636836fc9c7f7860ada535e26dbd6ca/src/meta/src/hummock/manager/mod.rs#L2597

To minimize the effects of jitter, we introduce the concept of window_size to make the statistics more accurate and add new statistics to the window at each commit_epoch. https://github.com/risingwavelabs/risingwave/blob/41f4ad55c636836fc9c7f7860ada535e26dbd6ca/src/meta/src/hummock/manager/mod.rs# L1779

Recently, we found that when a Barrier contains a large amount of data, we can't update the statistical information in time (affected by the barrier latency), and thus can't trigger the split in time.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

Feb 27 '24 08:02 Li0k

I'm assuming that the write amplification within cg2 / cg3 is still due to the data misalignment factor. It doesn't seem reasonable to perform a split directly during the new table creation or recovery phase. (We don't support merge at the moment).

I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.

@Little-Wallace @zwang28 @hzxa21

Feb 27 '24 08:02 Li0k

I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.

By split you mean putting data related to specific table ids in separate SSTs, not splitting compaction group, right?

If that is the case, is this a permanent change (applied to all future data related to these tables) or a temporary change (only applied to data related to these tables in some period)?

Feb 27 '24 08:02 hzxa21

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Jun 12 '24 08:06 github-actions[bot]

risingwave risingwave copied to clipboard

Bug(compaction): Unable to trigger split in time, when barrier latency is high

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

risingwave
risingwave copied to clipboard