trident icon indicating copy to clipboard operation
trident copied to clipboard

MountVolume.SetUp failed for volume Openshift 4.10

Open eselvam opened this issue 3 years ago • 19 comments

Describe the bug We use Openshift 4.10. pv and pvc are fine and bounded. We are able to mount the volume on the worker nodes manually however when we try to mount using openshift it is giving below error.

MountVolume.SetUp failed for volume "pvc-73701811-3298-4a7d-914f-bac813442324" : rpc error: code = Internal desc = error mounting NFS volume netappip:/trident_qtree_pool_openshift_prd_QBHALDTHGT/openshift_prd_pvc_73701811_3298_4a7d_914f_bac813442324 on mountpoint /var/lib/kubelet/pods/c4237f18-a3c7-46c9-9073-c2d7d544b719/volumes/kubernetes.io~csi/pvc-73701811-3298-4a7d-914f-bac813442324/mount: exit status 255

after checking further we came to know that the pvc directory not exist under kbernetes.io~csi for the pod which is created dynamically.

The manual mount on the worker nodes working perfectly so no issue with export policy etc. It seems something wrong on the csi.

Environment Prod

  • Trident version: 22.07.0
  • Trident installation flags used: Operator
  • Container runtime: Podman
  • Kubernetes version: 4.10
  • Kubernetes enabled feature gates: [e.g. CSINodeInfo]
  • OS: coreos redhat
  • NetApp backend types: FAS
  • Other:

To Reproduce create pod in openshift with pvc however the volume is not mounted even we can do manual mount in the worker node which it is scheduled to run

Expected behavior It should be mounted on worker.

Additional context Add any other context about the problem here.

eselvam avatar Aug 08 '22 05:08 eselvam

I0808 05:20:19.213186 1 connection.go:186] GRPC response: {"entries":[{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_60df5140_3119_414d_8d75_726ddd27d1fd","name":"pvc-60df5140-3119-414d-8d75-726ddd27d1fd","protocol":"file"},"volume_id":"pvc-60df5140-3119-414d-8d75-726ddd27d1fd"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_6e5e0db6_4893_444d_9bcb_116b8bbe3fe3","name":"pvc-6e5e0db6-4893-444d-9bcb-116b8bbe3fe3","protocol":"file"},"volume_id":"pvc-6e5e0db6-4893-444d-9bcb-116b8bbe3fe3"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"c90a0e42-dcfa-43f6-a1f6-4772151e1318","internalName":"openshift_prd_pvc_73701811_3298_4a7d_914f_bac813442324","name":"pvc-73701811-3298-4a7d-914f-bac813442324","protocol":"file"},"volume_id":"pvc-73701811-3298-4a7d-914f-bac813442324"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_bfbbb615_0be2_4f33_88f5_a26f0b08736e","name":"pvc-bfbbb615-0be2-4f33-88f5-a26f0b08736e","protocol":"file"},"volume_id":"pvc-bfbbb615-0be2-4f33-88f5-a26f0b08736e"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"c90a0e42-dcfa-43f6-a1f6-4772151e1318","internalName":"openshift_prd_pvc_d2fb6506_0145_4918_8725_768a07d028cd","name":"pvc-d2fb6506-0145-4918-8725-768a07d028cd","protocol":"file"},"volume_id":"pvc-d2fb6506-0145-4918-8725-768a07d028cd"}}]} I0808 05:20:19.213353 1 connection.go:187] GRPC error: I0808 05:21:19.218238 1 csi_handler.go:123] Reconciling VolumeAttachments with driver backend state I0808 05:21:19.218279 1 connection.go:183] GRPC call: /csi.v1.Controller/ListVolumes I0808 05:21:19.218286 1 connection.go:184] GRPC request: {}

eselvam avatar Aug 08 '22 06:08 eselvam

Hi @eselvam,

We test Trident against OCP 4.10 on a daily basis and have not seen this issue. The path name that you are reporting "kubernetes.io~csi" does not seem to be a common path name used in most configurations. Typically this path is instead "kubernetes.io/csi". The "~" character in the path may be causing the mount issue that you are experiencing.

gnarl avatar Aug 08 '22 15:08 gnarl

@eselvam I think this might be a temporary error. The exit status doesn't really help here and I think additional information is needed to find out what could be going wrong here. I would recommend confirming the kubeletDir path, since the path to each CSI volume is based on your kubelet directory.

balaramesh avatar Aug 08 '22 17:08 balaramesh

I have a case with Netapp as well. uploaded the tridenctl logs to case.

see the mount error below. I am able to mount it manually however through openshift it is not.

even the simple nginx with volume failing with same issue.

pod "jenkins-2-kd8jm" (UID: "bdc5ec62-01c8-4dee-873d-b77015dddb02") : rpc error: code = Internal desc = error mounting NFS volume :/trident_qtree_poolprd_pvc_8adc3d6d_9a8e_4e32_bf04_ad538dbb46b3 on mountpoint /var/lib/kubelet/pods/bdc5ec62-01c8-4dee-873d-b77015dddb02/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount: exit status 255

kubernetes.io~csi path exist in the worker nodes however the pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount does not hence we are facing an issue.

Shall I know from where it is getting the kubernetes.io~csi?

As it is Openshift with core os, I cannot modify anything on the OS.

Kindly advice.

eselvam avatar Aug 08 '22 18:08 eselvam

Please check below error from trident-main on the worker node. the osutils unable to create directory hence it is failing.

time="2022-08-08T17:46:35Z" level=debug msg=">>>> k8s_utils_linux.IsLikelyNotMountPoint" mountpoint="/var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount" requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI time="2022-08-08T17:46:35Z" level=debug msg="<<<< k8s_utils_linux.IsLikelyNotMountPoint" mountpoint="/var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount" requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI

time="2022-08-08T17:46:35Z" level=debug msg=">>>> osutils.execCommand." args="[-p /var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount]" command=mkdir requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI time="2022-08-08T17:46:35Z" level=debug msg="<<<< osutils.execCommand." command=mkdir error="" output= requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI

eselvam avatar Aug 09 '22 05:08 eselvam

// IsLikelyNotMountPoint determines if a directory is not a mountpoint. // It is fast but not necessarily ALWAYS correct. If the path is in fact // a bind mount from one part of a mount to another it will not be detected. // It also can not distinguish between mountpoints and symbolic links. // mkdir /tmp/a /tmp/b; mount --bind /tmp/a /tmp/b; IsLikelyNotMountPoint("/tmp/b") // will return true. When in fact /tmp/b is a mount point. If this situation // is of interest to you, don't use this function... func (mounter *Mounter) IsLikelyNotMountPoint(file string) (bool, error) { stat, err := os.Stat(file) if err != nil { return true, err } rootStat, err := os.Stat(filepath.Dir(strings.TrimSuffix(file, "/"))) if err != nil { return true, err } // If the directory has a different device as parent, then it is a mountpoint. if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev { return false, nil }

return true, nil

}

eselvam avatar Aug 09 '22 06:08 eselvam

Reached Redhat they said we need to work with you to get it fixed as this is from netapp.

Shall you please share your scc for trident, oc describe scc/trident to check what capabilities you have?

Also please let us know how to enable below capabilities for trident-csi and trident-operator accounts in openshift?

hostbindmounts,privileged,kernelCapabilities,hostNetwork,hostIpc,hostpid

eselvam avatar Aug 09 '22 10:08 eselvam

I think the trident scc does not have adequate privileges to create mkdir on the pod to mount the pvc ["downwardAPI","emptyDir","hostPath","projected"], we need to have "persistentVolumeClaim".

Please let me know how to add the "persistentVolumeClaim" in trident scc.

I try to add but whenever the trident-csi plugin try to mount it the "persistentVolumeClaim" gone from trident scc

eselvam avatar Aug 09 '22 12:08 eselvam

@eselvam When Trident is installed on OCP, SCC is also created as part of the installation, can you please check if there is an SCC or not by running oc get scc command to look for trident SCC.

rohit-arora-dev avatar Aug 09 '22 15:08 rohit-arora-dev

There is a SCC with name trident which does not have volume capabilities named persistentVolumeClaim.

I think due to that it is not able to create mount based on the claim.

Other mounts such as configmap are working fine because it is available in the trident scc.

Note: We use User provisioned openshift install on physical server.

eselvam avatar Aug 09 '22 15:08 eselvam

I believe Trident sets volume capabilities to:

volumes:
- '*'

It should include everything, out of curiosity what do you see in the output of the command oc get scc trident -o yaml?

rohit-arora-dev avatar Aug 09 '22 15:08 rohit-arora-dev

oc get scc trident -o yaml allowHostDirVolumePlugin: true allowHostIPC: true allowHostNetwork: true allowHostPID: true allowHostPorts: true allowPrivilegeEscalation: true allowPrivilegedContainer: true allowedCapabilities:

  • '*' allowedUnsafeSysctls:
  • '*' apiVersion: security.openshift.io/v1 defaultAddCapabilities: null fsGroup: type: RunAsAny groups: [] kind: SecurityContextConstraints metadata: annotations: kubernetes.io/description: trident is a clone of the privileged built-in, and is meant just for use with trident. creationTimestamp: "2022-08-08T17:45:23Z" generation: 8 labels: app: controller.csi.trident.netapp.io k8s_version: v1.23.3 trident_version: v22.07.0 name: trident ownerReferences:
    • apiVersion: trident.netapp.io/v1 controller: true kind: TridentOrchestrator name: trident uid: d8516cf9-7932-40cb-882b-2341b4e97ce6 resourceVersion: "44565962" uid: 78900f92-c51d-4803-a76e-55530d1bc18f priority: null readOnlyRootFilesystem: false requiredDropCapabilities: null runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny supplementalGroups: type: RunAsAny users:
  • system:serviceaccount:trident:trident-csi volumes:
  • downwardAPI
  • emptyDir
  • hostPath
  • projected

eselvam avatar Aug 09 '22 15:08 eselvam

I have above only. If I change, it is overwritten by csi.

eselvam avatar Aug 09 '22 15:08 eselvam

@eselvam thank you for sharing this! I would like to remind you that this is not a channel for support. I would advise to let support continue to diagnose your issue and allow them to make the required recommendations!

balaramesh avatar Aug 09 '22 17:08 balaramesh

sure. Thanks. so, it is something with operator which creates the scc in version 22.07?

eselvam avatar Aug 09 '22 17:08 eselvam

quick question: is securityContext changed in 22.07.0 trident operator? the below not exist in 22.01.1

securityContext: fsGroup: 1000670000 seLinuxOptions: level: s0:c26,c10

securityContext:
  capabilities:
    drop:
    - KILL
    - MKNOD
    - SETGID
    - SETUID
  runAsUser: 1000670000

eselvam avatar Aug 09 '22 17:08 eselvam

This issue should be addressed here. It seem the netapp trident operator has bug. The version 22.07 does not have below scc hence it is failed to mount pvc in pods however in 22.01.1 it is *. Please fix the code. Thanks.

tridentctl version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 22.01.1 | 22.01.1 | +----------------+----------------+ scc from 22.01.1 version

trident true [""] RunAsAny RunAsAny RunAsAny RunAsAny false [""]

eselvam avatar Aug 10 '22 04:08 eselvam

NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES anyuid false MustRunAs RunAsAny RunAsAny RunAsAny 10 false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] hostaccess false MustRunAs MustRunAsRange MustRunAs RunAsAny false ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"] hostmount-anyuid false MustRunAs RunAsAny RunAsAny RunAsAny false ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"] hostnetwork false MustRunAs MustRunAsRange MustRunAs MustRunAs false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] machine-api-termination-handler false MustRunAs RunAsAny MustRunAs MustRunAs false ["downwardAPI","hostPath"] node-exporter true RunAsAny RunAsAny RunAsAny RunAsAny false [""] nonroot false MustRunAs MustRunAsNonRoot RunAsAny RunAsAny false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] privileged true [""] RunAsAny RunAsAny RunAsAny RunAsAny false [""] restricted false MustRunAs MustRunAsRange MustRunAs RunAsAny false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] trident true [""] RunAsAny RunAsAny RunAsAny RunAsAny false ["*"]

eselvam avatar Aug 10 '22 04:08 eselvam

note under volumes it is *. It is bug in 22.07. please fix it.

eselvam avatar Aug 10 '22 04:08 eselvam

Hi @eselvam,

We verified again that Trident v22.07 is able to mount a NAS volume when using OCP 4.10. We do not think the mount issue that you were experiencing was related to an SCC permissions issue. If you are continuing to encounter this issue please open a NetApp support case.

gnarl avatar Feb 15 '23 19:02 gnarl