cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

WIP: ✨ Flexible Nova microversions

Open lentzi90 opened this issue 1 year ago • 30 comments

This is related to the proposal in #1565 and is just meant as a discussion starter for now. If you have any comments, ideas or suggestions, please feel free to comment here or in #1565.

What this PR does / why we need it:

Use a list of supported versions instead of a single hard coded one. The list can be filtered based on specific feature requirements (e.g. usage of server tags). The version to use is then picked from the intersection of this list and what the server supports.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #1448

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • [ ] squashed commits
  • if necessary:
    • [ ] includes documentation
    • [ ] adds unit tests

/hold

lentzi90 avatar May 29 '23 08:05 lentzi90

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot avatar May 29 '23 08:05 k8s-ci-robot

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
Latest commit b5847526a94860fbc82819b5649fcc5eef4454d6
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65b8c1484d468a0008ee1400
Deploy Preview https://deploy-preview-1567--kubernetes-sigs-cluster-api-openstack.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar May 29 '23 08:05 netlify[bot]

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar May 29 '23 08:05 k8s-ci-robot

Updated! I added 2 more commits: one to "undo" the previous changes so it is easier to compare different approaches, and one with new changes. What I have done is basically this:

  1. Use integers instead of strings and floats when comparing versions
  2. Use a separate RequireMicroversion method instead of passing the requirements to getComputeClient. I'm a bit torn on this. It is nice to not have to pass the required version around, but it also makes it easier to forget to set any requirements. I still added a version negotiation without specific requirements to the NewComputeClient function to avoid this "no version set" issue.
  3. Set a higher lowest version. As suggested I put the bar at 2.38 now instead of 2.1. I still think there is a valid reason to go with 2.1 (since that is the default when nothing is specified), but I also think it is so old that we should not have to support it.

lentzi90 avatar Jun 14 '23 09:06 lentzi90

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Sep 29 '23 10:09 lentzi90

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Sep 29 '23 10:09 lentzi90

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Sep 29 '23 12:09 lentzi90

/test pull-cluster-api-provider-openstack-e2e-test /test pull-cluster-api-provider-openstack-test /test pull-cluster-api-provider-openstack-build

lentzi90 avatar Oct 04 '23 11:10 lentzi90

It probably makes sense to move some of this to gophercloud. See https://github.com/gophercloud/gophercloud/pull/2791

lentzi90 avatar Oct 04 '23 12:10 lentzi90

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-test

lentzi90 avatar Oct 04 '23 12:10 lentzi90

/test pull-cluster-api-provider-openstack-test

lentzi90 avatar Oct 05 '23 06:10 lentzi90

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Oct 05 '23 06:10 lentzi90

/test pull-cluster-api-provider-openstack-test

lentzi90 avatar Oct 11 '23 06:10 lentzi90

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Oct 11 '23 06:10 lentzi90

E1011 07:53:34.531883       1 controller.go:324] "Reconciler error" err="create OpenStack instance: error creating Openstack instance: Bad request with: [POST http://10.0.3.15/compute/v2.1/servers], error message: {\"badRequest\": {\"code\": 400, \"message\": \"Invalid input for field/attribute 1. Value: {'boot_index': -1, 'delete_on_termination': True, 'destination_type': 'volume', 'source_type': 'volume', 'tag': 'extravol', 'uuid': '270c1689-b622-4f38-8635-2bc9992e73c9'}. Additional properties are not allowed ('tag' was unexpected)\"}}" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="e2e-y1cygi/cluster-e2e-y1cygi-control-plane-svfm7" namespace="e2e-y1cygi" name="cluster-e2e-y1cygi-control-plane-svfm7" reconcileID=eebc8754-c59b-4e56-9f1c-e35f0df73a90

The test is failing on the multi-AZ test because it does not see any tag in the instance spec, but a tag is still in the request. I'm a bit lost. Where is the tag coming from? Any ideas @mdbooth ?

lentzi90 avatar Oct 11 '23 13:10 lentzi90

I think it is because of the tag in the additional block volume! https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/ce90495b0d975772f3b16d17c96536447bae6232/pkg/cloud/services/compute/instance.go#L503

lentzi90 avatar Oct 11 '23 14:10 lentzi90

/test pull-cluster-api-provider-openstack-test

lentzi90 avatar Oct 12 '23 07:10 lentzi90

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Oct 12 '23 08:10 lentzi90

Alright, it does work! The failed tests before confirm that it is actually doing something also. It failed because there were tags in the AdditionalBlockDevices and I was only checking the instance spec. This is encouraging! I will now try to shuffle things around so that I use the gophercloud code from CAPO and see if I can get something useful from it.

lentzi90 avatar Oct 12 '23 10:10 lentzi90

this is great PR, seems ok now ,we can follow up on other parts later on just work on tags in this one still WIP now?

jichenjc avatar Oct 13 '23 04:10 jichenjc

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-test /test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Oct 24 '23 11:10 lentzi90

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 26 '24 04:01 k8s-triage-robot

/test pull-cluster-api-provider-openstack-build /test pull-cluster-api-provider-openstack-test /test pull-cluster-api-provider-openstack-e2e-test

lentzi90 avatar Jan 30 '24 12:01 lentzi90

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 28 '24 08:02 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 29 '24 09:03 k8s-triage-robot

/remove-lifecycle rotten

lentzi90 avatar Apr 02 '24 06:04 lentzi90