MaxSpotInstanceCountExceeded in GL tests due to GPU
We have a difficult scenario having MaxSpotInstanceCountExceeded errors. Probably reuse will solve this
If we enable the --reuse option, unit tests won't [always] cover the runner creation process. Do we want that?
@shcheklein found that there are multiple instances running for about a week. We need:
- [ ] automatic warning (email etc.) from AWS if instances run for more than e.g. 30min
- [ ] automatic shutdown of instances (timeout) by AWS
- [ ] figure out why CML didn't cleanly terminate the instances
--reuse will only hide the problem without solving it
related to #680
This might be related to #678 after seeying the logs seems that the chrono is not working properly
also https://github.com/cloud-custodian/cloud-custodian (@dberenbaum suggestion)
We haven't seen this in a while, closing for now.