autoscaler
autoscaler copied to clipboard
The script to update AWS EC2 instances fails with MissingRegion error
Which component are you using?: cluster-autoscaler
What version of the component are you using?:
The version from latest master.
What environment is this in?: AWS
What did you expect to happen?:
Running run ec2_instance_types/gen.go without any further flags, shall update the instances for all AWS regions. The code indicates "It will populate list from all regions if region is not specified.":
var region = flag.String("region", "", "aws region you'd like to generate instances from."+
"It will populate list from all regions if region is not specified.")
What happened instead?:
It seems that the script can be executed without any flags for any release up to and including cluster-autoscaler-release-1.22. For any branches newer than that, i.e., starting from cluster-autoscaler-release-1.23 and up to the master, this fails with:
root@777c5c9c2e4c:/autoscaler/cluster-autoscaler/cloudprovider/aws# go run ec2_instance_types/gen.go
F0505 12:24:53.578362 4531 gen.go:88] MissingRegion: could not find region configuration
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:860 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x326bd20, 0x3, 0x0, 0xc000702000, 0x1, {0x271efa5, 0x10}, 0x326c640, 0x0)
/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:825 +0x686
k8s.io/klog/v2.(*loggingT).printDepth(0x326bd20, 0x0, 0x0, {0x0, 0x0}, 0x1f154bb, {0xc00016c140, 0x1, 0x1})
/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:608 +0x1c7
k8s.io/klog/v2.(*loggingT).print(...)
/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:590
k8s.io/klog/v2.Fatal(...)
/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1490
main.main()
/autoscaler/cluster-autoscaler/cloudprovider/aws/ec2_instance_types/gen.go:88 +0x15f
exit status 255
How to reproduce it (as minimally and precisely as possible):
Install golang 1.17.5 (as specified in builder/Dockerfile) and run go run ec2_instance_types/gen.go inside the autoscaler/cluster-autoscaler/cloudprovider/aws directory.
Anything else we need to know?:
The GenerateEC2InstanceTypes() method has been rewritten, see commit a1faeddbe70146385a8afd4fdcd2cefc8d6fbca7:
commit a1faeddbe70146385a8afd4fdcd2cefc8d6fbca7
Author: Austin Siu <[email protected]>
Date: Wed Nov 10 01:14:32 2021 -0600
Use DescribeInstanceTypes API to get EC2 instance type details
The AWS SDK seems to expect a region, and it raises an error if the region string is empty.
Note: I see that the official AWS documentation https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html that the DescribeInstanceTypes describes the details of the instance types that are offered in a location (only).
I think this was introduced by @AustinSiu in this PR, https://github.com/kubernetes/autoscaler/issues/4436.
@AustinSiu may you let me know if there is any piece of information I'm missing?
@gregth you're right in that my change only supports generating the instance list when a region is specified, compared to prior to my change, when it made separate requests for all regions if one wasn't provided.
"It will populate list from all regions if region is not specified."
Is this still a case CA wants to support? If not, it would be best to remove this line to avoid confusion.
@AustinSiu we, at Arrikto, have been relying on this script to generate a complete list of instances for all the AWS regions. This option was very useful. I believe it is worth investigating how we can keep the same functionality while using the AWS SDK.
What I think of is, does the SDK offer a way to list all AWS regions? Then we can fetch the instances for every region.
I believe that the team maintaining the cluster autoscaler has been using the script to regenerate the static instance list across all regions periodically and ensure that everything is up-to-date.
Maybe @gjtempleton who had lately updated the static instances list can further comment on this.
My general view would be that the current code using the AWS APIs is a significant improvement on the previous setup, and that it provides us with the functionality we need given AWS announces where new instances are available on launch, so we can just run the script in that region when refreshing the static list, though we should as Austin says remove the confusing line.
I'd rather not maintain the old version of the script as well as the new version.
/area provider/aws
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.