vsphere-csi-driver
vsphere-csi-driver copied to clipboard
Volume provisioning fails with "invalid metadata" when using a native key provider
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened:
PV provisioning fails when using a native key provider. Specifically, I'm seeing the following messages on the ESXi host (my provider is named provider-0
, my cluster is named cl-0
):
warning kmxa[2097952] [Originator@6876 sub=Libs opID=resolveKey-52a1d087-b13e-71e3-dd80-4aca154a5817-48] Failed to resolve key /provider-0: invalid metadata.
and
2022-04-30T23:48:58.034Z info vpxa[2100223] [Originator@6876 sub=Default opID=0fea314b-ea6c-4664-b319-34d81a6f7b69-362537-12-e] [VpxLRO] -- ERROR task-72673 -- vstorageObjectManager -- vim.vslm.host.VStorageObjectManager.createDisk: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> reason = "Key locator error: api = DiskLib_Create, path = /vmfs/volumes/{redacted}/fcd/{redacted}.vmdk"
--> msg = "A general system error occurred: Key locator error: api = DiskLib_Create, path = /vmfs/volumes/{redacted}/fcd/{redacted}.vmdk"
--> }
--> Args:
-->
--> Arg spec:
--> (vim.vslm.CreateSpec) {
--> name = "pvc-{redacted}",
--> keepAfterDeleteVm = true,
--> backingSpec = (vim.vslm.CreateSpec.DiskFileBackingSpec) {
--> datastore = 'vim.Datastore:{redacted}',
--> path = <unset>,
--> virtualDiskFormat = <unset>,
--> provisioningType = <unset>
--> },
--> capacityInMB = 1024,
--> profile = (vim.vm.ProfileSpec) [
--> (vim.vm.DefinedProfileSpec) {
--> profileId = "{redacted}",
--> replicationSpec = (vim.vm.replication.ReplicationSpec) null,
--> profileData = (vim.vm.ProfileRawData) {
--> extensionKey = "com.vmware.vim.sps",
--> objectData = "<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>vmwarevmcrypt@ENCRYPTION</id><namespace>IOFILTERS</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>AllowCleartextFilters</id><value xmlns:s90="http://www.w3.org/2001/XMLSchema" xsi:type="s90:string">False</value></propertyInstance></constraint></ns1:capability><ns1:name>Host based services</ns1:name></ns1:subProfiles></ns1:constraints><ns1:createdBy>Temporary user handle</ns1:createdBy><ns1:creationTime>2022-03-12T06:39:17.817+00:00</ns1:creationTime><ns1:description>Sample storage policy for VMware's VM and virtual disk encryption</ns1:description><ns1:generationId>2</ns1:gen
erationId><ns1:lastUpdatedBy>Temporary user handle</ns1:lastUpdatedBy><ns1:lastUpdatedTime>2022-04-29T16:11:20.912-07:00</ns1:lastUpdatedTime><ns1:name>VM Encryption Policy</ns1:name><ns1:profileId>{redacted}</ns1:profileId></ns1:storageProfile>"
--> },
--> profileParams = <unset>
--> }
--> ],
--> crypto = (vim.encryption.CryptoSpecEncrypt) {
--> cryptoKeyId = (vim.encryption.CryptoKeyId) {
--> keyId = "",
--> providerId = (vim.encryption.KeyProviderId) {
--> id = "provider-0"
--> }
--> },
--> inPlace = <unset>
--> },
--> metadata = (vim.KeyValue) [
--> (vim.KeyValue) {
--> key = "cns.tag",
--> value = "true"
--> },
--> (vim.KeyValue) {
--> key = "cns.version",
--> value = "3"
--> },
--> (vim.KeyValue) {
--> key = "cns.containerCluster.clusterId",
--> value = "cl-0"
--> },
--> (vim.KeyValue) {
--> key = "cns.containerCluster.clusterType",
--> value = "KUBERNETES"
--> },
--> (vim.KeyValue) {
--> key = "cns.containerCluster.vSphereUser",
--> value = "{redacted}"
--> },
--> (vim.KeyValue) {
--> key = "cns.containerCluster.clusterFlavor",
--> value = "VANILLA"
--> }
--> ]
--> }
I have the storagepolicyname
param set to VM Encryption Policy
. All node VM's and all disks are already encrypted using provider-0
. The provisioner successfully creates unencrypted PV's when using the default storage policy. provider-0
has a Key ID
in vSphere, but it's clearly missing in the log message. Am I missing part of the setup or is the csi-provisioner
not setting a required param?
What you expected to happen:
The csi-provisioner
to successfully create an encrypted FCD.
How to reproduce it (as minimally and precisely as possible):
Create a Native Key Provider
. Create a storage policy that uses the VMware VM Encryption
provider. Create a storage class that uses the policy. Create a PVC using the storage class.
Anything else we need to know?: DRS and Storage DRS are enabled.
Environment:
- csi-vsphere version: v2.5.1
- vsphere-cloud-controller-manager version: v1.22.6
- Kubernetes version: v1.22.9
- vSphere version: 7.0.3 (build 19480866)
- OS (e.g. from /etc/os-release): Ubuntu 22.04 LTS
- Kernel (e.g.
uname -a
): 5.15.0-27-generic - Install tools: kubeadm + kubectl
- ESXi version: 7.0.3 (build 19482537)
@nick-oconnor Can you file an SR and upload vSphere support bundle? https://kb.vmware.com/s/article/83329?lang=en_US
@divyenpatel on it.
@divyenpatel my VMUG account doesn't have permissions to open a technical support request. guess this is going to stay broken :-(
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
I can still see this error. Results in following error:
2022-10-05T17:19:38.949683736Z vanilla/controller.go:567 failed to create volume. Error: failed to create volume with fault: "(*types.LocalizedMethodFault)(0xc0005d9820)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) <nil>,\n Reason: (string) (len=16) \"VSLM task failed\"\n },\n LocalizedMessage: (string) (len=32) \"CnsFault error: VSLM task failed\"\n})\n"
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:567
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:854
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/vanilla/controller.go:856
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler.func1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5596
github.com/rexray/gocsi/middleware/serialvolume.(*interceptor).createVolume
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/serialvolume/serial_volume_locker.go:162
github.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/serialvolume/serial_volume_locker.go:90
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:178
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:218
github.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware/specvalidator/spec_validator.go:177
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi.(*StoragePlugin).injectContext
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/middleware.go:231
github.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:99
github.com/rexray/gocsi/utils.ChainUnaryServer.func2
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/rexray/gocsi/utils/utils_middleware.go:106
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5598
google.golang.org/grpc.(*Server).processUnaryRPC
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1024
google.golang.org/grpc.(*Server).handleStream
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:722
Hello, Exactly the same problem on my side. Did you find a workaround / solution?
/assign @divyenpatel
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hello, Same problem here. Did you find a workaround / solution?
/reopen
@ccleouf66: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@nick-oconnor: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@ccleouf66 unfortunately no. I bailed and setup encryption at the block level (beneath VMFS). I'd wager that this is still a problem.
Ok thanks for the info @nick-oconnor, you know if vmware or someone try to solve it ?
@ccleouf66 nope. I do not.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.