autoscaling
autoscaling copied to clipboard
agent/core: Treat failed requests as potentially successful
Fixes #680, see there for detail on motivation. tl;dr: this fixes a known category of bugs, and AFAICT is a pre-requisite for using the VM spec as a source of truth.
Brief summary of changes:
- Introduce a new
resourceBounds
struct in pkg/agent/core that handles the uncertainty associated with requests that may or may not have succeeded. - Switch internal usage so plugin permit, vm-monitor approved, and VM spec resources all are represented by
resourceBounds
- Add a new test to extensively test this (
TestFailuresNotAssumedSuccessful
)
I expect we'll find bugs with this in production. Most of those should be fine - restarting the pkg/agent.Runner
and retrying with a fresh slate.
Possible liveness issues would be more concerning (e.g. getting into a state where we stop communicating with other components). Those should hopefully be handled by the new test.
Notes for review: Keeping it marked as a draft for now — want to first validate that this is a workable strategy for building towards #350.