autoscaler
autoscaler copied to clipboard
Extract criteria for removing unneded nodes to a separate package
Which component this PR applies to?
cluster-autoscaler
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
One more refactor of existing scale down code in order to make parallel scale down (https://github.com/kubernetes/autoscaler/issues/5079) possible.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This is a follow-up to https://github.com/kubernetes/autoscaler/pull/5133. Only last 2 commits should be reviewed as a part of this PR. It will be on hold until the previous one is merged.
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/hold /assign @MaciekPytel /assign @yaroslava-serdiuk
/hold cancel
I think this will work and I'm going to block it anymore, but I don't really like this refactor for the reasons I listed in comments above. I'm leaving hold so you can have a chance to read my comments and decide if you want to address them or keep the current version. Please feel free to remove the hold.
/lgtm /approve /hold
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: MaciekPytel, x13n
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~cluster-autoscaler/OWNERS~~ [MaciekPytel]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
I understand where the concern is coming from, but I disagree caching information is wrong in this case. Update
/ RemovableAt
semantics are well defined: Update
updates Nodes
object with the last known state of the world, while RemovableAt
returns the list of nodes that are removable at a certain timestamp, based on past Update
calls. This is going to hold in any implementation that understands such semantics. I don't think Nodes
object should be aware of autoscaler loops, pods or nodes being filtered out or anything that happens beyond this object. It's the other way around: if someone is trying to use Nodes
anywhere in the code, they need to make sure they are using it properly. Omitting an Update
call is not a problem of the object: it is a problem of the caller: if they didn't provide fresh information, the list of removable nodes may be inaccurate. Note that it will carries the risk of being inaccurate regardless of when the call happened: CA is making in-memory decisions based on snapshotted state of a distributed system. For its decisions to be accurate, we would have to freeze all changes in the system during the decision making, which is infeasible, for obvious reasons. That being said, I agree the code could be improved (it always can), but I'd like to merge it as is: it is a strict improvement over the previous state, it already enables further changes to implement parallel drain and it has been hanging for over 1.5 month now.
/hold cancel