vsphere-csi-driver
vsphere-csi-driver copied to clipboard
Sometimes vsphere syncer fails to sync metadata with unable to acquire file lock
It looks some sync fail with following error:
{"level":"error","time":"2022-03-14T22:49:04.266550924Z","caller":"volume/manager.go:1103","msg":"failed to update volume.
updateSpec: \"(*types.CnsVolumeMetadataUpdateSpec)(0xc00107b5f0)({\\n DynamicData: (types.DynamicData) {\\n },\\n
VolumeId: (types.CnsVolumeId) {\\n DynamicData: (types.DynamicData) {\\n },\\n Id: (string) (len=36) \\\"ba096274-d1ce-41c1-953b-6bda7b74945b\\\"\\n },\\n Metadata: (types.CnsVolumeMetadata) {\\n DynamicData: (types.DynamicData) {\\n
},\\n ContainerCluster: (types.CnsContainerCluster) {\\n DynamicData: (types.DynamicData) {\\n },\\n ClusterType: (string) (len=10) \\\"KUBERNETES\\\",\\n ClusterId: (string) (len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\",\\n VSphereUser: (string)
(len=22) \\\"VSPHERE.LOCAL\\\\\\\\ci_user4\\\",\\n ClusterFlavor: (string) (len=7) \\\"VANILLA\\\",\\n ClusterDistribution: (string) \\\"\\\"\\n },\\n EntityMetadata: ([]types.BaseCnsEntityMetadata) (len=1 cap=1) {\\n (*types.CnsKubernetesEntityMetadata)
(0xc0002c7380)({\\n CnsEntityMetadata: (types.CnsEntityMetadata) {\\n DynamicData: (types.DynamicData) {\\n },\\n EntityName: (string) (len=40) \\\"pvc-5f4aff1b-9af5-418b-9031-9816ab8acb2f\\\",\\n Labels: ([]types.KeyValue) <nil>,\\n
Delete: (bool) false,\\n ClusterID: (string) (len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\"\\n },\\n EntityType: (string) (len=17) \\\"PERSISTENT_VOLUME\\\",\\n Namespace: (string) \\\"\\\",\\n ReferredEntity: ([]types.CnsKubernetesEntityReference)
<nil>\\n })\\n },\\n ContainerClusterArray: ([]types.CnsContainerCluster) (len=1 cap=1) {\\n (types.CnsContainerCluster) {\\n DynamicData: (types.DynamicData) {\\n },\\n ClusterType: (string) (len=10) \\\"KUBERNETES\\\",\\n ClusterId: (string)
(len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\",\\n VSphereUser: (string) (len=22) \\\"[email protected]\\\",\\n ClusterFlavor: (string) (len=7) \\\"VANILLA\\\",\\n ClusterDistribution: (string) \\\"\\\"\\n }\\n }\\n }\\n})\\n\", fault:
\"(*types.LocalizedMethodFault)(0xc001c3a060)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n BaseMethodFault: (types.BaseMethodFault) <nil>,\\n Reason: (string) (len=560) \\\"(vmodl.fault.SystemError) {\\\\n faultCause
= (vmodl.MethodFault) null, \\\\n faultMessage = <unset>, \\\\n reason = \\\\\\\"Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-fd05-3cecef1b8ff6
/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n msg = \\\\\\\"A general system error occurred: Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c
/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n}\\\"\\n },\\n LocalizedMessage: (string) (len=576) \\\"CnsFault error: (vmodl.fault.SystemError) {\\\\n faultCause = (vmodl.MethodFault) null,
\\\\n faultMessage = <unset>, \\\\n reason = \\\\\\\"Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090
/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n msg = \\\\\\\"A general system error occurred: Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-
fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n}\\\"\\n})\\n\", opID: \"c8645a92\"","TraceId":"70c5efe3-23ee-40ea-9872-96d62cb707de","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-
lib/volume.(*defaultManager).UpdateVolumeMetadata.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/common/cns-lib/volume/manager.go:1103\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.
(*defaultManager).UpdateVolumeMetadata\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/common/cns-lib/volume/manager.go:1111\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/syncer.csiPVUpdated\n\t/go/src/git
The key message is:
Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c
/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk
Is vsphere syncer racy? Is this because of concurrent actions happening against same volume?
cc @RaunakShah @divyenpatel
@gnufied what is the vSphere version you are using?
It appears to be 7.0.2
- build 17920168
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@gnufied: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten
/assign
@gnufied Apologies for the delayed response. From the log, it is a CnsFault
. You can also see an opID associated with the CNS task. To investigate, CNS usually requires a VC support bundle. Is that available? Additionally, is this a recurring condition?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.