ec2-plugin icon indicating copy to clipboard operation
ec2-plugin copied to clipboard

JENKINS-66373: idle timeout should not fight with min spare instances setting

Open sparrowt opened this issue 11 months ago • 2 comments

The problem

Currently if you use the "Minimum number of spare instances" setting as well as "Idle termination time", then there is a constant battle going on between:

  • MinimumInstanceChecker.checkForMinimumInstances provisioning new agents to try and maintain the minimum number of spare instances as configured
  • however the idle timeout checks in this method are killing off the 'spare' instances once they reach the idle termination time

This means that the spare instances are repeatedly killed & recreated as described on https://issues.jenkins.io/browse/JENKINS-66373 which is wasteful and means that, some percentage of the time, there are never enough spare instances because they are being booted (again).

Solution

This PR makes a simple change to the idle termination logic so that it takes account of "Minimum number of spare instances" in the same way that it already accounts for the main "Minimum number of instances" setting.

Testing done

None yet - EC2RetentionStrategyTest.java seems like the right place to add a test case for this specific scenario.

Right now I've got very limited time and wanted to at least get the proposed fix up for discussion to start with.

### Submitter checklist
- [x] Make sure you are opening from a **topic/feature/bugfix branch** (right side) and not your main branch!
- [x] Ensure that the pull request title represents the desired changelog entry
- [x] Please describe what you did
- [x] Link to relevant issues in GitHub or Jira
- [ ] Link to relevant pull requests, esp. upstream and downstream changes
- [ ] Ensure you have provided tests - that demonstrates feature works or fixes the issue

sparrowt avatar Mar 06 '24 16:03 sparrowt

@res0nance who would be good to have an initial review of this?

sparrowt avatar Mar 28 '24 13:03 sparrowt

@res0nance who would be good to have an initial review of this?

Could you add some unit tests for this change?

res0nance avatar Mar 30 '24 14:03 res0nance