rocketmq
rocketmq copied to clipboard
Backoff strategy when channel is not writable
The issue tracker is used for bug reporting purposes ONLY whereas feature request needs to follow the RIP process. To avoid unnecessary duplication, please check whether there is a previous issue before filing a new one.
It is recommended to start a discussion thread in the mailing lists in cases of discussing your deployment plan, API clarification, and other non-bug-reporting issues. We welcome any friendly suggestions, bug fixes, collaboration, and other improvements.
Please ensure that your bug report is clear and self-contained. Otherwise, it would take additional rounds of communication, thus more time, to understand the problem itself.
Generally, fixing an issue goes through the following steps:
- Understand the issue reported;
- Reproduce the unexpected behavior locally;
- Perform root cause analysis to identify the underlying problem;
- Create test cases to cover the identified problem;
- Work out a solution to rectify the behavior and make the newly created test cases pass;
- Make a pull request and go through peer review;
As a result, it would be very helpful yet challenging if you could provide an isolated project reproducing your reported issue. Anyway, please ensure your issue report is informative enough for the community to pick up. At a minimum, include the following hints:
FEATURE REQUEST
-
Please describe the feature you are requesting. #5066
-
Provide any additional detail on your proposed use case for this feature.
-
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?
-
If there are some sub-tasks involved, use -[] for each sub-task and create a corresponding issue to map to the sub-task:
- sub-task1-issue-number: sub-task1 description here,
- sub-task2-issue-number: sub-task2 description here,
- ...
now:
new:
@fuyou001 @RongtongJin @zhouxinyu
This is my current idea, let's see if there is a better solution
It's seems like using @rety annotation in spring framework. I wonder in which case the channel will be not writable 🤔
From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex
It's seems like using @rety annotation in spring framework. I wonder in which case the channel will be not writable 🤔
The amount of data written to a single channel is large, and the amount of buffer data is higher than the threshold
From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex
During testing, it will appear as long as the tps of a single consumer reaches the threshold.
In my last benchmark, the production rate finally stabilized at 40w-50wtps, a single consumer reaching 6w-7wtps will trigger the current limit, when I use 4 consumers, there will still be some queues that cannot be consumed and accumulated.
From the point of view of stability, I suggest keeping the logic as it is, in benchmark case, consumer is not one instance, if we change , the code will complex
During testing, it will appear as long as the tps of a single consumer reaches the threshold.
In my last benchmark, the production rate finally stabilized at 40w-50wtps, a single consumer reaching 6w-7wtps will trigger the current limit, when I use 4 consumers, there will still be some queues that cannot be consumed and accumulated.
As a comparison, you can set netty highwatermark higher value in benchmark,for eg 4M.
The throttling mechanism provided by Netty should suffice. I do not see a reason to complicate it. Further, caching data in RetryTaskQueue instead of the application/network buffer is not a good idea.
IMO, the key is not how to adjust the highwatermark, but we should not directly return null when the channel is not writable, which results in clients pull timeout and default is 30s.
Maybe is not a good idea, the backoff strategy will occupy more heap memory.
IMO, The reason for dropping responses when the channel is not writable is caching too many responses that may cause OOM. Therefore, we shouldn't introduce a retry mechanism. The alternative is to drop response immediately and return system busy when the channel is writable. In this way client could request again instead of waiting long polling.
This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.
This issue was closed because it has been inactive for 3 days since being marked as stale.