gcp-compute-persistent-disk-csi-driver
gcp-compute-persistent-disk-csi-driver copied to clipboard
Migrate to csi proxy v2 alpha
What type of PR is this?
Uncomment only one
/kind <>line, hit enter to put that in a new line, and remove leading whitespaces from that line:/kind api-change /kind bug /kind cleanup /kind design /kind documentation /kind failing-test /kind feature /kind flake
What this PR does / why we need it:
Migrates GCP Compute PD CSI Driver to use CSI Proxy v2. Besides updating the Go package code to use the new library, we also update the deployment files to run the driver as HostProcess containers. See the linked issue.
Which issue(s) this PR fixes:
Fixes #1070
Special notes for your reviewer:
This is a work in progress. We still need to update documentation to reflect the increased minimal Kubernetes version.
Does this PR introduce a user-facing change?:
Migrated Windows CSI node plugin to HostProcess containers
Action required: Ensure that the k8s version is 1.23+ and set the security policies
in the cluster to enable the HostProcess container feature only to system workloads.
Hi @alexander-ding. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/cc @mauriciopoppe
/ok-to-test /test pull-gcp-compute-persistent-disk-csi-driver-e2e-win2019
There's an optional presubmit job for Windows, please look for it in the test-infra codebase
It was not /test pull-gcp-compute-persistent-disk-csi-driver-e2e-win2019 :(
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
I haven't run /test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019 in a while so it might have drifted from the CI one, I'd take that one and compare it with the presubmit
/retest
The last run https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#windows-2019-presubmit-gcp-compute-persistent-disk-csi-driver is red but here it's green, probably you can check again after removing the TODO
One more thing, I created a few tracking issues a while ago in this repo too, I think we should use those
https://github.com/kubernetes-csi/csi-proxy/issues/217 tracks the issues in this repo
New changes are detected. LGTM label has been removed.
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
Thanks for the PR @alexander-ding !
/approve
I'll let @mauriciopoppe give the final lgtm when all the comments are fixed.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alexander-ding, mattcary
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [mattcary]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/cc @mattcary requesting a re-review for the latest commit. I switched over to the new hpc image that @mauriciopoppe suggested and had to change a few things in the Makefile and Dockerfile to tag the image. Could you take a look and see if there's anything we would need to fix in automated testing/release pipelines?
Thanks for the ping. I think this should be fine. @mauriciopoppe do you agree this will work with our internal louhi pipeline? The BASE_IMAGE will be ignored but we will still tag the manifest. I guess we might end up building identical images multiple times (ie a couple of ltsc, 20h2) where really we should have one image with multiple tags?
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
Thanks for the ping. I think this should be fine. @mauriciopoppe do you agree this will work with our internal louhi pipeline? The BASE_IMAGE will be ignored but we will still tag the manifest. I guess we might end up building identical images multiple times (ie a couple of ltsc, 20h2) where really we should have one image with multiple tags?
I took a look. I think the pipeline will do redundant work but should be fine otherwise. We can refactor the Makefile here and the Louhi pipelines together when the upstream issue is actually fixed.
Awesome, thanks for following through.
I'll let Mauricio give the final lgtm, my approve should still be good.
On Wed, Nov 2, 2022 at 11:19 AM Alexander Ding @.***> wrote:
Thanks for the ping. I think this should be fine. @mauriciopoppe https://github.com/mauriciopoppe do you agree this will work with our internal louhi pipeline? The BASE_IMAGE will be ignored but we will still tag the manifest. I guess we might end up building identical images multiple times (ie a couple of ltsc, 20h2) where really we should have one image with multiple tags?
I took a look. I think the pipeline will do redundant work but should be fine otherwise. We can refactor the Makefile here and the Louhi pipelines together when the upstream issue https://github.com/containerd/containerd/issues/7431 is actually fixed.
— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1071#issuecomment-1301041583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJCBAHXVNUTWMUCVSHC46DWGKWBNANCNFSM6AAAAAARTNJCOA . You are receiving this because you were mentioned.Message ID: <kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1071/c1301041583 @github.com>
/retest
Test failed but it seemed like a flake. The test was waiting for the external injector pod to come online, but once it did it complains "container not found". I don't see how this is related to my changes. Also, for some reason the optional windows test has no test history.
I had to run retest earlier for the regular e2e test as well due to a flake. Might be something worth looking into if this keeps happening
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
@mattcary we're trying to make changes in the base manifests but are unsure about modifying the overlays here and also the changes in the internal manifests, I thought about a couple of options and wanted to know your thoughts
- modify the base manifests with v2 and add overlays for low k8s versions that add back the v1 config
- modify the overlays with v2 versions in the k8s version that support HPC
Is there another option that you suggest?
I'd prefer not to minimize touching the old version manifests, because we don't have much in the way of testing around them I think to confirm that they actually work. So I think it's safer to add overlays for dev, prow, stable-master for v2.
Then maybe when we start rolling off old versions (eg 1.21 and 1.22 should drop soon as they are EOL, 1.23 is going EOL early next year) we can switch and have an overlay for the v1 and have v2 in the base.
I think this also means you should copy stable-master to a new 1.25 version that uses v1, then have stable-master with an overlay for v2.
Does that all make sense?
@mattcary we're trying to make changes in the base manifests but are unsure about modifying the overlays here and also the changes in the internal manifests, I thought about a couple of options and wanted to know your thoughts
- modify the base manifests with v2 and add overlays for low k8s versions that add back the v1 config
- modify the overlays with v2 versions in the k8s version that support HPC
Is there another option that you suggest?
I'd prefer not to minimize touching the old version manifests, because we don't have much in the way of testing around them I think to confirm that they actually work. So I think it's safer to add overlays for dev, prow, stable-master for v2.
Then maybe when we start rolling off old versions (eg 1.21 and 1.22 should drop soon as they are EOL, 1.23 is going EOL early next year) we can switch and have an overlay for the v1 and have v2 in the base.
I think this also means you should copy stable-master to a new 1.25 version that uses v1, then have stable-master with an overlay for v2.
Does that all make sense?
@mattcary to make sure we're on the same page, you're suggesting that we
- Keep the base manifests as is
- Rename stable-master to stable-1.25
- Create a new stable-master that overlays v2 stuff on the base manifest
- Modify dev, noauth, and prow overlays to use v2 as well
Is that right?
Note that the windows node overlay for v2 would be pretty involved, and we have to copy that for every overlay that needs v2 (and remove it when we move the base manifest to use v2). If that's okay, I can go ahead with that
thanks for the feedback @mattcary
@alexander-ding I think that the steps you wrote look good to me, I think that it's easy to add things because kustomize will just merge the manifests, it's a little bit difficult to delete things though (e.g. the sections in the Spec that mount and use the named pipes) but it's also doable.
- Keep the base manifests as is
- Rename stable-master to stable-1.25
- Create a new stable-master that overlays v2 stuff on the base manifest
- Modify dev, noauth, and prow overlays to use v2 as well
Oops, sorry for the lag in reply.
Yes, those 4 steps are what I had in mind.
Is that right?
Note that the windows node overlay for v2 would be pretty involved, and we have to copy that for every overlay that needs v2 (and remove it when we move the base manifest to use v2). If that's okay, I can go ahead with that
Yeah, understood. I think it's important to keep the base simple while we're tweaking the v2 overlay ---we need to be able to easily make changes to the production overlays which mean 1.23-1.25 at this point.
When v2 development gets stable then we can switch over.
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-win2019
@alexander-ding: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:
/test pull-gcp-compute-persistent-disk-csi-driver-e2e/test pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration/test pull-gcp-compute-persistent-disk-csi-driver-sanity/test pull-gcp-compute-persistent-disk-csi-driver-unit/test pull-gcp-compute-persistent-disk-csi-driver-verify
The following commands are available to trigger optional jobs:
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
Use /test all to run the following jobs that were automatically triggered:
pull-gcp-compute-persistent-disk-csi-driver-e2epull-gcp-compute-persistent-disk-csi-driver-kubernetes-integrationpull-gcp-compute-persistent-disk-csi-driver-unitpull-gcp-compute-persistent-disk-csi-driver-verify
In response to this:
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-win2019
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019
@mattcary @mauriciopoppe I pushed some changes with the overlays. However, I noticed that the stable overlays all point to specific image tags, but we don't have a fixed version tag for the latest development image.
I did some more readings on the release process for this repository, namely the overlay strategies and go/pdcsi-oss-release-process (linked to by the repository README). I think we should take a step back and talk through the release pipeline.
Strictly speaking, this PR doesn't introduce new functionalities, so it can be considered a patch, not a new minor release. In this case, this PR should only update the development overlays (dev and noauth-debug) and nothing else. Then, we need to merge this PR to master and cherry pick the change into the release 1.8 branch, which is currently in prerelease, and release the updated driver as v1.8.0-rc2.
Once the new prerelease is generated, we can create a separate PR to update the prow-stable-sidecar-rc-master overlay to use the rc2 prerelease and ensure that tests are green.
When ready to release, we can update all the relevant production overlays (stable-1-23, stable-1-24, and stable-master) to be the same as the prow-stable-sidecar-rc-master staging overlay.
Releasing stable-1-25 should be a separate process.
If we decide that this is a big enough change to merit its own minor version, then this release needs to be v1.9.0. We would have to go through the entire pre-release/release process. We should probably wait until v1.8 is stable before merging to master.
Let me know what you think. I'll go ahead and revert to the overlay changes I outlined (only dev and noauth-debug).