remoting
remoting copied to clipboard
Configurable retry delay and jitter
Allow the retry delay to be configured rather than hard-coded to 10 seconds, and allow the fixed delay to be combined with jitter to help avoid thundering herds.
@rahulsom I seem to recall you mentioning this problem before; would this remove your need for any custom wrappers?
Thanks! This is nice. It should allow me to rely on this instead of custom logic.
@rahulsom Are you interested in ripping our your custom logic in favor of the incremental build from this PR to validate that it works?
I'm on a brief break but will test it out sometime next week. Is that cool?
Sure, thanks!
@rahulsom Are you still interested in kicking the tires on this?
Sorry about the delay! I tried a few things - I either ended up with a condition where the connection never crashes or it crashes so badly that the process terminates. Do you have hints on how to cause this particular kind of reconnect?
@rahulsom Reconnect can be triggered easily by restarting the controller while inbound agent(s) are connected. For example, create an agent in the UI with a Launch method of Launch agent by connecting it to the controller, then download the incremental build of Remoting from this PR and run java -jar /path/to/remoting.jar -url https://${JENKINS_URL} -secret ${SECRET} -name ${AGENT_NAME}. When you restart the controller you should see "Terminated" in the agent logs followed by "Performing onReconnect operation" 10 seconds later, since the default value of -retryDelay is 10 seconds.
When multiple agents are launched at the same time and the controller subsequently restarts, all agents should notice and start reconnecting at the same time, creating a thundering herd. The new jitter functionality being introduced in this PR can then be used to solve the thundering herd problem via the newly-introduced -retryJitter or -retryJitterFactor options. The idea would be that anyone who uses a custom wrapper to introduce jitter should be able to remove the custom wrapper, observe the thundering herd problem, and then observe that the problem goes away when the wrapper is replaced with the -retryJitter or -retryJitterFactor options from this PR.