vsphere-csi-driver icon indicating copy to clipboard operation
vsphere-csi-driver copied to clipboard

Sometimes vsphere syncer fails to sync metadata with unable to acquire file lock

Open gnufied opened this issue 2 years ago • 9 comments

It looks some sync fail with following error:

{"level":"error","time":"2022-03-14T22:49:04.266550924Z","caller":"volume/manager.go:1103","msg":"failed to update volume. 
updateSpec: \"(*types.CnsVolumeMetadataUpdateSpec)(0xc00107b5f0)({\\n DynamicData: (types.DynamicData) {\\n },\\n 
VolumeId: (types.CnsVolumeId) {\\n  DynamicData: (types.DynamicData) {\\n  },\\n  Id: (string) (len=36) \\\"ba096274-d1ce-41c1-953b-6bda7b74945b\\\"\\n },\\n Metadata: (types.CnsVolumeMetadata) {\\n  DynamicData: (types.DynamicData) {\\n  
},\\n  ContainerCluster: (types.CnsContainerCluster) {\\n   DynamicData: (types.DynamicData) {\\n   },\\n   ClusterType: (string) (len=10) \\\"KUBERNETES\\\",\\n   ClusterId: (string) (len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\",\\n   VSphereUser: (string) 
(len=22) \\\"VSPHERE.LOCAL\\\\\\\\ci_user4\\\",\\n   ClusterFlavor: (string) (len=7) \\\"VANILLA\\\",\\n   ClusterDistribution: (string) \\\"\\\"\\n  },\\n  EntityMetadata: ([]types.BaseCnsEntityMetadata) (len=1 cap=1) {\\n   (*types.CnsKubernetesEntityMetadata)
(0xc0002c7380)({\\n    CnsEntityMetadata: (types.CnsEntityMetadata) {\\n     DynamicData: (types.DynamicData) {\\n     },\\n     EntityName: (string) (len=40) \\\"pvc-5f4aff1b-9af5-418b-9031-9816ab8acb2f\\\",\\n     Labels: ([]types.KeyValue) <nil>,\\n     
Delete: (bool) false,\\n     ClusterID: (string) (len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\"\\n    },\\n    EntityType: (string) (len=17) \\\"PERSISTENT_VOLUME\\\",\\n    Namespace: (string) \\\"\\\",\\n    ReferredEntity: ([]types.CnsKubernetesEntityReference) 
<nil>\\n   })\\n  },\\n  ContainerClusterArray: ([]types.CnsContainerCluster) (len=1 cap=1) {\\n   (types.CnsContainerCluster) {\\n    DynamicData: (types.DynamicData) {\\n    },\\n    ClusterType: (string) (len=10) \\\"KUBERNETES\\\",\\n    ClusterId: (string) 
(len=26) \\\"ci-op-s63trmc3-55b1b-sdvq4\\\",\\n    VSphereUser: (string) (len=22) \\\"[email protected]\\\",\\n    ClusterFlavor: (string) (len=7) \\\"VANILLA\\\",\\n    ClusterDistribution: (string) \\\"\\\"\\n   }\\n  }\\n }\\n})\\n\", fault: 
\"(*types.LocalizedMethodFault)(0xc001c3a060)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\\n  Reason: (string) (len=560) \\\"(vmodl.fault.SystemError) {\\\\n   faultCause 
= (vmodl.MethodFault) null, \\\\n   faultMessage = <unset>, \\\\n   reason = \\\\\\\"Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-fd05-3cecef1b8ff6
/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n   msg = \\\\\\\"A general system error occurred: Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c
/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n}\\\"\\n },\\n LocalizedMessage: (string) (len=576) \\\"CnsFault error: (vmodl.fault.SystemError) {\\\\n   faultCause = (vmodl.MethodFault) null, 
\\\\n   faultMessage = <unset>, \\\\n   reason = \\\\\\\"Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090
/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n   msg = \\\\\\\"A general system error occurred: Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c/bd294161-20a1-00f7-
fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk\\\\\\\"\\\\n}\\\"\\n})\\n\", opID: \"c8645a92\"","TraceId":"70c5efe3-23ee-40ea-9872-96d62cb707de","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-
lib/volume.(*defaultManager).UpdateVolumeMetadata.func1\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/common/cns-lib/volume/manager.go:1103\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.
(*defaultManager).UpdateVolumeMetadata\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/common/cns-lib/volume/manager.go:1111\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/syncer.csiPVUpdated\n\t/go/src/git


The key message is:

Failed to lock the file: api = DiskLib_Open, _diskPath->CValue() = /vmfs/volumes/vsan:523ea352e875627d-b090c96b526bb79c
/bd294161-20a1-00f7-fd05-3cecef1b8ff6/_0090/e4daa20ac7fa496b833954ba2d923d3c.vmdk

Is vsphere syncer racy? Is this because of concurrent actions happening against same volume?

cc @RaunakShah @divyenpatel

gnufied avatar Mar 15 '22 19:03 gnufied

@gnufied what is the vSphere version you are using?

divyenpatel avatar Mar 18 '22 19:03 divyenpatel

It appears to be 7.0.2 - build 17920168

gnufied avatar Mar 18 '22 19:03 gnufied

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 16 '22 20:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 16 '22 21:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 15 '22 21:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 15 '22 21:08 k8s-ci-robot

/reopen

gnufied avatar Oct 11 '22 19:10 gnufied

@gnufied: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 11 '22 19:10 k8s-ci-robot

/remove-lifecycle rotten

gnufied avatar Oct 11 '22 19:10 gnufied

/assign

adikul30 avatar Nov 17 '22 22:11 adikul30

@gnufied Apologies for the delayed response. From the log, it is a CnsFault. You can also see an opID associated with the CNS task. To investigate, CNS usually requires a VC support bundle. Is that available? Additionally, is this a recurring condition?

adikul30 avatar Jan 05 '23 20:01 adikul30

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 05 '23 21:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 05 '23 22:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jun 04 '23 22:06 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 04 '23 22:06 k8s-ci-robot