vsphere-csi-driver Volume provisioning fails with "invalid metadata" when using a native key provider

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: PV provisioning fails when using a native key provider. Specifically, I'm seeing the following messages on the ESXi host (my provider is named provider-0, my cluster is named cl-0):

warning kmxa[2097952] [Originator@6876 sub=Libs opID=resolveKey-52a1d087-b13e-71e3-dd80-4aca154a5817-48] Failed to resolve key /provider-0: invalid metadata.

and

2022-04-30T23:48:58.034Z info vpxa[2100223] [Originator@6876 sub=Default opID=0fea314b-ea6c-4664-b319-34d81a6f7b69-362537-12-e] [VpxLRO] -- ERROR task-72673 -- vstorageObjectManager -- vim.vslm.host.VStorageObjectManager.createDisk: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "Key locator error: api = DiskLib_Create, path = /vmfs/volumes/{redacted}/fcd/{redacted}.vmdk"
-->    msg = "A general system error occurred: Key locator error: api = DiskLib_Create, path = /vmfs/volumes/{redacted}/fcd/{redacted}.vmdk"
--> }
--> Args:
-->
--> Arg spec:
--> (vim.vslm.CreateSpec) {
-->    name = "pvc-{redacted}",
-->    keepAfterDeleteVm = true,
-->    backingSpec = (vim.vslm.CreateSpec.DiskFileBackingSpec) {
-->       datastore = 'vim.Datastore:{redacted}',
-->       path = <unset>,
-->       virtualDiskFormat = <unset>,
-->       provisioningType = <unset>
-->    },
-->    capacityInMB = 1024,
-->    profile = (vim.vm.ProfileSpec) [
-->       (vim.vm.DefinedProfileSpec) {
-->          profileId = "{redacted}",
-->          replicationSpec = (vim.vm.replication.ReplicationSpec) null,
-->          profileData = (vim.vm.ProfileRawData) {
-->             extensionKey = "com.vmware.vim.sps",
-->             objectData = "<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>vmwarevmcrypt@ENCRYPTION</id><namespace>IOFILTERS</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>AllowCleartextFilters</id><value xmlns:s90="http://www.w3.org/2001/XMLSchema" xsi:type="s90:string">False</value></propertyInstance></constraint></ns1:capability><ns1:name>Host based services</ns1:name></ns1:subProfiles></ns1:constraints><ns1:createdBy>Temporary user handle</ns1:createdBy><ns1:creationTime>2022-03-12T06:39:17.817+00:00</ns1:creationTime><ns1:description>Sample storage policy for VMware's VM and virtual disk encryption</ns1:description><ns1:generationId>2</ns1:gen
erationId><ns1:lastUpdatedBy>Temporary user handle</ns1:lastUpdatedBy><ns1:lastUpdatedTime>2022-04-29T16:11:20.912-07:00</ns1:lastUpdatedTime><ns1:name>VM Encryption Policy</ns1:name><ns1:profileId>{redacted}</ns1:profileId></ns1:storageProfile>"
-->          },
-->          profileParams = <unset>
-->       }
-->    ],
-->    crypto = (vim.encryption.CryptoSpecEncrypt) {
-->       cryptoKeyId = (vim.encryption.CryptoKeyId) {
-->          keyId = "",
-->          providerId = (vim.encryption.KeyProviderId) {
-->             id = "provider-0"
-->          }
-->       },
-->       inPlace = <unset>
-->    },
-->    metadata = (vim.KeyValue) [
-->       (vim.KeyValue) {
-->          key = "cns.tag",
-->          value = "true"
-->       },
-->       (vim.KeyValue) {
-->          key = "cns.version",
-->          value = "3"
-->       },
-->       (vim.KeyValue) {
-->          key = "cns.containerCluster.clusterId",
-->          value = "cl-0"
-->       },
-->       (vim.KeyValue) {
-->          key = "cns.containerCluster.clusterType",
-->          value = "KUBERNETES"
-->       },
-->       (vim.KeyValue) {
-->          key = "cns.containerCluster.vSphereUser",
-->          value = "{redacted}"
-->       },
-->       (vim.KeyValue) {
-->          key = "cns.containerCluster.clusterFlavor",
-->          value = "VANILLA"
-->       }
-->    ]
--> }

I have the storagepolicyname param set to VM Encryption Policy. All node VM's and all disks are already encrypted using provider-0. The provisioner successfully creates unencrypted PV's when using the default storage policy. provider-0 has a Key ID in vSphere, but it's clearly missing in the log message. Am I missing part of the setup or is the csi-provisioner not setting a required param?

What you expected to happen: The csi-provisioner to successfully create an encrypted FCD.

How to reproduce it (as minimally and precisely as possible): Create a Native Key Provider. Create a storage policy that uses the VMware VM Encryption provider. Create a storage class that uses the policy. Create a PVC using the storage class.

Anything else we need to know?: DRS and Storage DRS are enabled.

Environment:

csi-vsphere version: v2.5.1
vsphere-cloud-controller-manager version: v1.22.6
Kubernetes version: v1.22.9
vSphere version: 7.0.3 (build 19480866)
OS (e.g. from /etc/os-release): Ubuntu 22.04 LTS
Kernel (e.g. uname -a): 5.15.0-27-generic
Install tools: kubeadm + kubectl
ESXi version: 7.0.3 (build 19482537)

May 01 '22 00:05 nick-oconnor

@nick-oconnor Can you file an SR and upload vSphere support bundle? https://kb.vmware.com/s/article/83329?lang=en_US

May 19 '22 19:05 divyenpatel

@divyenpatel on it.

May 19 '22 19:05 nick-oconnor

@divyenpatel my VMUG account doesn't have permissions to open a technical support request. guess this is going to stay broken :-(

May 19 '22 20:05 nick-oconnor

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 17 '22 21:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 16 '22 21:09 k8s-triage-robot

/remove-lifecycle rotten

I can still see this error. Results in following error:

2022-10-05T17:19:38.949683736Z vanilla/controller.go:567 failed to create volume. Error: failed to create volume with fault: "(*types.LocalizedMethodFault)(0xc0005d9820)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\n  Reason: (string) (len=16) \"VSLM task failed\"\n },\n LocalizedMessage: (string) (len=32) \"CnsFault error: VSLM task failed\"\n})\n"
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:567
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:854
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:856
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler.func1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5596
github.com/rexray/gocsi/middleware/serialvolume.(*interceptor).createVolume
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/serialvolume/serial_volume_locker.go:162
github.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/serialvolume/serial_volume_locker.go:90
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:178
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:218
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:177
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi.(*StoragePlugin).injectContext
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware.go:231
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi/utils.ChainUnaryServer.func2
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:106
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5598
google.golang.org/grpc.(*Server).processUnaryRPC
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1024
google.golang.org/grpc.(*Server).handleStream
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:722

Oct 05 '22 18:10 gnufied

Hello, Exactly the same problem on my side. Did you find a workaround / solution?

Oct 25 '22 07:10 OlivierJavaux

/assign @divyenpatel

Nov 03 '22 21:11 gohilankit

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 01 '23 22:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Mar 03 '23 23:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Apr 02 '23 23:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 02 '23 23:04 k8s-ci-robot

Hello, Same problem here. Did you find a workaround / solution?

Sep 04 '23 07:09 ccleouf66

/reopen

Sep 04 '23 07:09 ccleouf66

@ccleouf66: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 04 '23 07:09 k8s-ci-robot

/reopen

Sep 05 '23 07:09 nick-oconnor

@nick-oconnor: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 05 '23 07:09 k8s-ci-robot

@ccleouf66 unfortunately no. I bailed and setup encryption at the block level (beneath VMFS). I'd wager that this is still a problem.

Sep 05 '23 07:09 nick-oconnor

Ok thanks for the info @nick-oconnor, you know if vmware or someone try to solve it ?

Sep 15 '23 14:09 ccleouf66

@ccleouf66 nope. I do not.

Sep 20 '23 04:09 nick-oconnor

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 20 '24 08:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 20 '24 08:01 k8s-ci-robot

vsphere-csi-driver vsphere-csi-driver copied to clipboard

Volume provisioning fails with "invalid metadata" when using a native key provider

vsphere-csi-driver
vsphere-csi-driver copied to clipboard