autoscaler
autoscaler copied to clipboard
Allow nodes that are not managed by the CPEM / CCM to work with the autoscaler node pools in Equinix Metal
Which component are you using?: Cluster Autoscaler for Equinix Metal.
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Currently it's not possible to have Equinix Metal nodes NOT managed by the CCM / CPEM be able to work with the cluster autoscaler or be part of a pool.
Describe the solution you'd like.:
Allow nodes that are not managed by the CCM / CPEM to work with the autoscaler node pools. From my testing the autoscaler looks for the node spec.ProviderID
string which is added by CCM / CPEM and if it is missing for any single node, the autoscaler throws errors and cannot proceed with its logic for other nodes that are managed by the CCM / CPEM. This should be more graceful in my opinion and allow the autoscaler to work even if there are nodes that may not qualify.
Describe any alternative solutions you've considered.:
I have discovered that manually adding the node spec.ProviderID
string to nodes that are NOT managed by the CCM / CPEM makes them work with the autoscaler.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@enkelprifti98 As far as I can tell, after reviewing other Cloud Provider implementations, spec.ProviderID
is a requirement for the node
resource to be used by Autoscaler. The Cloud Provider is expected to provide this value and the formatting is defined by that cloud provider.
Here's an example from the EC2 provider: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go#L112-L153
(I also checked DO, Linode, and Heztner's provider and found that if they only consult spec.ProviderID and don't attempt other forms of detection).
When you manually set the ProviderID, I would assume you have to do so for each node that is added through scaling.
The Equinix provider (as with the others) could do better about surfacing this as an error. This could also be documented as an installation requirement (either running a CCM, or manually defining and maintaining the ProviderIDs). https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/equinixmetal#ccm-and-controller-node-labels
I don't think it would be Autoscaler's place to assume it knows which Nodes relate to which EM Instance IDs. Autoscaler doesn't necessarily run on each EM Instance, so metadata lookup is not an option for discovery. Mapping Node IPs to EM Instance IDs could be an approach, but Node IP addresses are not necessarily public and even when they are, they may be floating and temporarily assigned or issued through a gateway+DHCP device (and the InstanceID may map to that gateway device).
(I see Autodiscovery options for Azure, AWS, and GCP, but I'm not sure what this feature is and if it means you don't need spec.ProviderID: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup)
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.