ec2-plugin
ec2-plugin copied to clipboard
Add re-submission of tasks during spot interruption disconnects
This PR adds a new feature - re-submission of tasks for agents that are disconnected due to spot interruption event in AWS. Whenever an agent is disconnected, there are checks to determine if it is an unexpected disconnect and if the disconnection is a spot interruption event. If the answer is yes to both, the tasks that were running on the agent will be re-submitted to the queue.
Motivation
Builds may fail due to spot instances being terminated. This PR can help to reduce the number of build failures for spot interruption events.
Notes
This may or may not prevent build failures. There doesn't seem to be any documentation on how tasks can be resubmitted. This PR is inspired by another Jenkins plugin that has the suggested behaviour implemented - https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetAutoResubmitComputerLauncher.java
Can someone help to review this PR to see if its ok? It's actually identical to #485 but I opened a new PR so that it's eligible for hacktoberfest 😅
yeah i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful!
This seems to have been approved in October 2020. Is this going to be merged soon? This would be really helpful for us
Yes, this would be really awesome to add - any plans?
Hello, we're also looking forward to this feature.
AFAICT this seems to do what it says but it is a hard to test feature.
i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful!
It should be possible to test it now with this new-ish* AWS feature: AWS Fault Injection Simulator now injects Spot Instance Interruptions