rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

Backoff strategy when channel is not writable

Open guyinyou opened this issue 2 years ago • 9 comments

The issue tracker is used for bug reporting purposes ONLY whereas feature request needs to follow the RIP process. To avoid unnecessary duplication, please check whether there is a previous issue before filing a new one.

It is recommended to start a discussion thread in the mailing lists in cases of discussing your deployment plan, API clarification, and other non-bug-reporting issues. We welcome any friendly suggestions, bug fixes, collaboration, and other improvements.

Please ensure that your bug report is clear and self-contained. Otherwise, it would take additional rounds of communication, thus more time, to understand the problem itself.

Generally, fixing an issue goes through the following steps:

  1. Understand the issue reported;
  2. Reproduce the unexpected behavior locally;
  3. Perform root cause analysis to identify the underlying problem;
  4. Create test cases to cover the identified problem;
  5. Work out a solution to rectify the behavior and make the newly created test cases pass;
  6. Make a pull request and go through peer review;

As a result, it would be very helpful yet challenging if you could provide an isolated project reproducing your reported issue. Anyway, please ensure your issue report is informative enough for the community to pick up. At a minimum, include the following hints:

FEATURE REQUEST

  1. Please describe the feature you are requesting. #5066

  2. Provide any additional detail on your proposed use case for this feature.

  3. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

  4. If there are some sub-tasks involved, use -[] for each sub-task and create a corresponding issue to map to the sub-task:

  • sub-task1-issue-number: sub-task1 description here,
  • sub-task2-issue-number: sub-task2 description here,
  • ...

guyinyou avatar Sep 16 '22 02:09 guyinyou

now:

image




new:

image

guyinyou avatar Sep 16 '22 02:09 guyinyou

@fuyou001 @RongtongJin @zhouxinyu

This is my current idea, let's see if there is a better solution

guyinyou avatar Sep 16 '22 02:09 guyinyou

It's seems like using @rety annotation in spring framework. I wonder in which case the channel will be not writable 🤔

Knowden avatar Sep 16 '22 09:09 Knowden

From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex

fuyou001 avatar Sep 16 '22 11:09 fuyou001

It's seems like using @rety annotation in spring framework. I wonder in which case the channel will be not writable 🤔

The amount of data written to a single channel is large, and the amount of buffer data is higher than the threshold

guyinyou avatar Sep 19 '22 06:09 guyinyou

From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex

During testing, it will appear as long as the tps of a single consumer reaches the threshold.

In my last benchmark, the production rate finally stabilized at 40w-50wtps, a single consumer reaching 6w-7wtps will trigger the current limit, when I use 4 consumers, there will still be some queues that cannot be consumed and accumulated.

guyinyou avatar Sep 19 '22 06:09 guyinyou

From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex

During testing, it will appear as long as the tps of a single consumer reaches the threshold.

In my last benchmark, the production rate finally stabilized at 40w-50wtps, a single consumer reaching 6w-7wtps will trigger the current limit, when I use 4 consumers, there will still be some queues that cannot be consumed and accumulated.

As a comparison, you can set netty highwatermark higher value in benchmark,for eg 4M.

fuyou001 avatar Sep 20 '22 01:09 fuyou001

The throttling mechanism provided by Netty should suffice. I do not see a reason to complicate it. Further, caching data in RetryTaskQueue instead of the application/network buffer is not a good idea.

lizhanhui avatar Sep 20 '22 02:09 lizhanhui

IMO, the key is not how to adjust the highwatermark, but we should not directly return null when the channel is not writable, which results in clients pull timeout and default is 30s.

RongtongJin avatar Sep 20 '22 06:09 RongtongJin

Maybe is not a good idea, the backoff strategy will occupy more heap memory.

lizhimins avatar Sep 23 '22 08:09 lizhimins

IMO, The reason for dropping responses when the channel is not writable is caching too many responses that may cause OOM. Therefore, we shouldn't introduce a retry mechanism. The alternative is to drop response immediately and return system busy when the channel is writable. In this way client could request again instead of waiting long polling.

ShadowySpirits avatar Sep 23 '22 08:09 ShadowySpirits

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

github-actions[bot] avatar Sep 24 '23 00:09 github-actions[bot]

This issue was closed because it has been inactive for 3 days since being marked as stale.

github-actions[bot] avatar Sep 28 '23 00:09 github-actions[bot]