cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

add support for spot instances in AKS

Open zmalik opened this issue 4 years ago • 36 comments

What type of PR is this?

/kind feature

What this PR does / why we need it:

  • Adds support for spot instances node pool for AKS.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes part of #1701

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • [ ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests

Release note:

add support for spot node pool in AKS

zmalik avatar Dec 15 '21 11:12 zmalik

Hi @zmalik. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 15 '21 11:12 k8s-ci-robot

/ok-to-test

CecileRobertMichon avatar Dec 15 '21 17:12 CecileRobertMichon

CLA Signed

The committers are authorized under a signed CLA.

  • :white_check_mark: Zain Malik (388fa2c4928b725fa2f802c009175fd9cadce3af)

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Dec 16 '21 12:12 k8s-ci-robot

/retest

zmalik avatar Dec 16 '21 15:12 zmalik

@CecileRobertMichon: GitHub didn't allow me to request PR reviews from the following users: meixingdb.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/lgtm /cc @meixingdb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 16 '21 21:12 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please ask for approval from cecilerobertmichon after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Dec 20 '21 09:12 k8s-ci-robot

rebased and squashed

zmalik avatar Dec 20 '21 09:12 zmalik

/area managedclusters

CecileRobertMichon avatar Jan 05 '22 22:01 CecileRobertMichon

@zmalik can you please check why test is failing on this one when you get a chance?

CecileRobertMichon avatar Jan 07 '22 20:01 CecileRobertMichon

CLA Signed

The committers are authorized under a signed CLA.

  • :white_check_mark: Zain Malik (388fa2c4928b725fa2f802c009175fd9cadce3af, 757870fc2cfdf81e45222e1a75e518d12d07f1da)

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Jan 10 '22 20:01 k8s-ci-robot

rebased to bring all fixes from main

zmalik avatar Jan 10 '22 20:01 zmalik

/retest

zmalik avatar Jan 11 '22 17:01 zmalik

will rebase after https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1927 lands as it will require another rebase after that

zmalik avatar Jan 17 '22 08:01 zmalik

will pick this PR now, that taints and label PRs have been merged

zmalik avatar Feb 16 '22 14:02 zmalik

@zmalik is on 🔥

jackfrancis avatar Feb 16 '22 19:02 jackfrancis

failed with No AKS versions found for v1.19 will try to bump in this PR just to test the fix

zmalik avatar Feb 22 '22 08:02 zmalik

that fixed it

zmalik avatar Feb 22 '22 11:02 zmalik

New changes are detected. LGTM label has been removed.

k8s-ci-robot avatar Feb 22 '22 20:02 k8s-ci-robot

removed the AKS version bump as https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2108 merged

zmalik avatar Feb 22 '22 20:02 zmalik

added to e2e tests a spot agentpool

zmalik avatar Feb 23 '22 10:02 zmalik

/test pull-cluster-api-provider-azure-e2e-windows-dockershim

zmalik avatar Feb 23 '22 12:02 zmalik

/test pull-cluster-api-provider-azure-e2e

zmalik avatar Feb 23 '22 18:02 zmalik

@zmalik: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-e2e-windows-dockershim ef1563786fd90e4c5fa15fbfcf7178166eea3680 link true /test pull-cluster-api-provider-azure-e2e-windows-dockershim
pull-cluster-api-provider-azure-ci-entrypoint ef1563786fd90e4c5fa15fbfcf7178166eea3680 link false /test pull-cluster-api-provider-azure-ci-entrypoint
pull-cluster-api-provider-azure-e2e ef1563786fd90e4c5fa15fbfcf7178166eea3680 link true /test pull-cluster-api-provider-azure-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Feb 24 '22 13:02 k8s-ci-robot

after doing more experiments looks like AKS returns errors when setting from empty scalesetpriority to Regular

Changing property 'properties.ScaleSetPriority' is not allowed

This is an issue for all existing nodepools that were created through old versions of CAPZ. which means we will need to make it by default empty also.

zmalik avatar Feb 24 '22 19:02 zmalik

@zmalik can we use *string to our advantage anywhere to be able to differentiate between empty string and actually non-existent?

jackfrancis avatar Feb 24 '22 19:02 jackfrancis

@zmalik can we use *string to our advantage anywhere to be able to differentiate between empty string and actually non-existent?

yes, we can do that. But can that be confusing for users?

if it's nil --> we don't set any default value? (as that would be case for users upgrading from old version of CAPZ) if it's not nil but empty --> we set a default value of Regular as by design in CAPZ we don't allow empty string for scalesetpriority?

was that the proposal or reverse?

zmalik avatar Feb 24 '22 19:02 zmalik

@zmalik Yeah that's my thinking, though I'm just driving by and haven't looked at all the codepaths yet!

The reason to think hard about figuring out some solution of setting a capz default value is to avoid unexpected changes in the future when AKS decides to change its opinion of what should be default.

jackfrancis avatar Feb 24 '22 20:02 jackfrancis

yeah, I agree.

Mostly I'm thinking in case we introduce some default and user has to set empty string to trigger the default it might not be intuitive. As with nil value the old and new will still bypass any defaulting.

Can we ask AKS team if changing from "" to Regular in Azure API should be allowed as effectively "" means Regular and the error message Changing property 'properties.ScaleSetPriority' is not allowed is purely an API error as user is not changing the instance type but setting the ScaleSetPriority which it already has.

zmalik avatar Feb 25 '22 07:02 zmalik