Fix client backoff
The code here fails to set a minimum value for the backoff based on this project.
This means if the server is down, the backoff minimum for every chisel user is set to a 100ms value by default according to the screenshot above and will result in extremely short backoffs (default behavior is shown below).
Specifically, the sequence will always be 100ms, 200ms, 400ms, 800ms, and 1.6s (if the server is down). This technically gives the user about 2 seconds to get the server back up before the client exits entirely, and it is noisy over the wire to send this many packets this fast. I hope you can see how this can be problematic based on use case. The updated behavior with this commit can be seen below and gives users more time to get the server back up and running:
That’s strange, chisel client should retry forever…
On Fri, 25 Oct 2024 at 3:55 AM Dylan Evans @.***> wrote:
The code here https://github.com/jpillora/chisel/blob/ab8f06a83048dca0c24dc0b06932dc98df54e8b1/client/client_connect.go#L22C1-L22C55 fails to set a minimum value for the backoff based on this https://github.com/jpillora/backoff project.
image.png (view on web) https://github.com/user-attachments/assets/36f8c2cf-3dbc-4b07-9126-ec61cfdb37f7
This means if the server is down, the backoff minimum for every chisel user is set to a 100ms value by default according to the screenshot above and will result in extremely short backoffs (default behavior is shown below).
image.png (view on web) https://github.com/user-attachments/assets/953b22cc-3f60-440a-b6a6-25c0eba6b60c
Specifically, the sequence will always be 100ms, 200ms, 400ms, 800ms, and 1.6s (if the server is down). This technically gives the user about 2 seconds to get the server back up before the client exits entirely, and it is noisy over the wire to send this many packets this fast. I hope you can see how this can be problematic based on use case. The updated behavior with this commit can be seen below and gives users more time to get the server back up and running:
image.png (view on web) https://github.com/user-attachments/assets/e555a341-765e-4078-b472-7c375db2c57f
You can view, comment on, or merge this pull request online at:
https://github.com/jpillora/chisel/pull/537 Commit Summary
- 4ee133c https://github.com/jpillora/chisel/pull/537/commits/4ee133cc5ad6e9f8c13363dbb85d0367953ecf83 Fix client backoff
File Changes
(3 files https://github.com/jpillora/chisel/pull/537/files)
- M client/client.go https://github.com/jpillora/chisel/pull/537/files#diff-bd3a55a72186f59e2e63efb4951573b2f9e4a7cc98086e922b0859f8ccc1dd09 (4)
- M client/client_connect.go https://github.com/jpillora/chisel/pull/537/files#diff-b81c2753c2377ac47f62677fba0bea2b7e11d810389b94eb99310d65077e10c3 (2)
- M main.go https://github.com/jpillora/chisel/pull/537/files#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261 (4)
Patch Links:
- https://github.com/jpillora/chisel/pull/537.patch
- https://github.com/jpillora/chisel/pull/537.diff
— Reply to this email directly, view it on GitHub https://github.com/jpillora/chisel/pull/537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X46MWG3Y3YKF5JPCFC3Z5ERB7AVCNFSM6AAAAABQRQXWTKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYYTEMJRGY4DSMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Respectfully, I never said retrying was the issue or that the client would stop retrying unintentionally... the title of the PR is Fix client backoff because of the way you have configured the backoff for the client and is about backoff time. There should be more time in between the first 5 retries and the user should be able to control the time with a new flag... The first 5 retries will all take place within ~2 seconds which doesn't make sense from my standpoint. You are not giving the user enough time in between each retry in order to get the server back up in case it goes down. With your current logic, the backoff will eventually get above 2 seconds if you go above 5 retries, but it would be better to increase the minimum value of the backoff and allow the user to control it. This way, not as many retries are required and we can still get the wait time desired.
You always set Min of the Backoff to 100ms here and don't give the user the option to set it, so the first 5 backoffs will always be 100ms, 200ms, 400ms, 800ms, and 1.6s if the server goes down. This also means that if the user sets --max-retry-count to a value between 0 - 5 and the server goes down, the client will retry for ~2 seconds before exiting and the user will lose their chisel connection due to the extremely short wait times. Original:
After fix (and --min-retry-interval set to 1m):
You can do your own testing to verify what I am reporting. Start a server, connect a client, kill the server with the client still connected and you'll see the first 5 backoffs will always be 100ms, 200ms, 400ms, 800ms, 1.6s, etc. with no ability to customize.
I would classify this as a bug due to the unintentional nature of this behavior. You don't document this anywhere and I'm pretty sure a lot of people are unaware unless they take a look at the code surrounding the backoff logic. If you don't plan on fixing, you should document it is a known issue so people at least have the heads up. The logic might work for a development branch or testing, but when it comes to production version of the code, the user should be able to customize this behavior relating to the backoff.
If not exposed as a client flag, I believe enabling Jitter by default would be a good thing as well:
b := &backoff.Backoff{Jitter: true, Max: c.config.MaxRetryInterval, Min: c.config.MinRetryInterval}
If not exposed as a client flag, I believe enabling Jitter by default would be a good thing as well:
b := &backoff.Backoff{Jitter: true, Max: c.config.MaxRetryInterval, Min: c.config.MinRetryInterval}
I'll add this to another commit @liojacqs soon 😄
@liojacqs added the jitter option as a bool flag. @jpillora any updated thoughts so far on this PR?
Just one thing on the first commit, for the default minimum retry inteval:
This code makes it impossible to have <1s retry interval other than 100ms, and I would personnaly add more flexibility, like:
if c.MinRetryInterval <= 0 {
c.MinRetryInterval = 100 * time.Millisecond
}
That way the default will remain 100ms, but we can also set something like 1ms/250ms/600ms....