MountVolume.SetUp failed for volume Openshift 4.10
Describe the bug We use Openshift 4.10. pv and pvc are fine and bounded. We are able to mount the volume on the worker nodes manually however when we try to mount using openshift it is giving below error.
MountVolume.SetUp failed for volume "pvc-73701811-3298-4a7d-914f-bac813442324" : rpc error: code = Internal desc = error mounting NFS volume netappip:/trident_qtree_pool_openshift_prd_QBHALDTHGT/openshift_prd_pvc_73701811_3298_4a7d_914f_bac813442324 on mountpoint /var/lib/kubelet/pods/c4237f18-a3c7-46c9-9073-c2d7d544b719/volumes/kubernetes.io~csi/pvc-73701811-3298-4a7d-914f-bac813442324/mount: exit status 255
after checking further we came to know that the pvc directory not exist under kbernetes.io~csi for the pod which is created dynamically.
The manual mount on the worker nodes working perfectly so no issue with export policy etc. It seems something wrong on the csi.
Environment Prod
- Trident version: 22.07.0
- Trident installation flags used: Operator
- Container runtime: Podman
- Kubernetes version: 4.10
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]
- OS: coreos redhat
- NetApp backend types: FAS
- Other:
To Reproduce create pod in openshift with pvc however the volume is not mounted even we can do manual mount in the worker node which it is scheduled to run
Expected behavior It should be mounted on worker.
Additional context Add any other context about the problem here.
I0808 05:20:19.213186 1 connection.go:186] GRPC response: {"entries":[{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_60df5140_3119_414d_8d75_726ddd27d1fd","name":"pvc-60df5140-3119-414d-8d75-726ddd27d1fd","protocol":"file"},"volume_id":"pvc-60df5140-3119-414d-8d75-726ddd27d1fd"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_6e5e0db6_4893_444d_9bcb_116b8bbe3fe3","name":"pvc-6e5e0db6-4893-444d-9bcb-116b8bbe3fe3","protocol":"file"},"volume_id":"pvc-6e5e0db6-4893-444d-9bcb-116b8bbe3fe3"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"c90a0e42-dcfa-43f6-a1f6-4772151e1318","internalName":"openshift_prd_pvc_73701811_3298_4a7d_914f_bac813442324","name":"pvc-73701811-3298-4a7d-914f-bac813442324","protocol":"file"},"volume_id":"pvc-73701811-3298-4a7d-914f-bac813442324"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"bb457270-50a5-4dfa-b651-015fa82208a9","internalName":"openshift_prd_pvc_bfbbb615_0be2_4f33_88f5_a26f0b08736e","name":"pvc-bfbbb615-0be2-4f33-88f5-a26f0b08736e","protocol":"file"},"volume_id":"pvc-bfbbb615-0be2-4f33-88f5-a26f0b08736e"}},{"status":{},"volume":{"capacity_bytes":107374182400,"volume_context":{"backendUUID":"c90a0e42-dcfa-43f6-a1f6-4772151e1318","internalName":"openshift_prd_pvc_d2fb6506_0145_4918_8725_768a07d028cd","name":"pvc-d2fb6506-0145-4918-8725-768a07d028cd","protocol":"file"},"volume_id":"pvc-d2fb6506-0145-4918-8725-768a07d028cd"}}]}
I0808 05:20:19.213353 1 connection.go:187] GRPC error:
Hi @eselvam,
We test Trident against OCP 4.10 on a daily basis and have not seen this issue. The path name that you are reporting "kubernetes.io~csi" does not seem to be a common path name used in most configurations. Typically this path is instead "kubernetes.io/csi". The "~" character in the path may be causing the mount issue that you are experiencing.
@eselvam I think this might be a temporary error. The exit status doesn't really help here and I think additional information is needed to find out what could be going wrong here. I would recommend confirming the kubeletDir path, since the path to each CSI volume is based on your kubelet directory.
I have a case with Netapp as well. uploaded the tridenctl logs to case.
see the mount error below. I am able to mount it manually however through openshift it is not.
even the simple nginx with volume failing with same issue.
pod "jenkins-2-kd8jm" (UID: "bdc5ec62-01c8-4dee-873d-b77015dddb02") : rpc error: code = Internal desc = error mounting NFS volume :/trident_qtree_poolprd_pvc_8adc3d6d_9a8e_4e32_bf04_ad538dbb46b3 on mountpoint /var/lib/kubelet/pods/bdc5ec62-01c8-4dee-873d-b77015dddb02/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount: exit status 255
kubernetes.io~csi path exist in the worker nodes however the pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount does not hence we are facing an issue.
Shall I know from where it is getting the kubernetes.io~csi?
As it is Openshift with core os, I cannot modify anything on the OS.
Kindly advice.
Please check below error from trident-main on the worker node. the osutils unable to create directory hence it is failing.
time="2022-08-08T17:46:35Z" level=debug msg=">>>> k8s_utils_linux.IsLikelyNotMountPoint" mountpoint="/var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount" requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI time="2022-08-08T17:46:35Z" level=debug msg="<<<< k8s_utils_linux.IsLikelyNotMountPoint" mountpoint="/var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount" requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI
time="2022-08-08T17:46:35Z" level=debug msg=">>>> osutils.execCommand." args="[-p /var/lib/kubelet/pods/67d44622-3e3b-42a3-bb04-b3bb5e3c7721/volumes/kubernetes.io~csi/pvc-8adc3d6d-9a8e-4e32-bf04-ad538dbb46b3/mount]" command=mkdir requestID=2c451fad-4db8-4b52-908f-15aea26f895d requestSource=CSI
time="2022-08-08T17:46:35Z" level=debug msg="<<<< osutils.execCommand." command=mkdir error="
// IsLikelyNotMountPoint determines if a directory is not a mountpoint. // It is fast but not necessarily ALWAYS correct. If the path is in fact // a bind mount from one part of a mount to another it will not be detected. // It also can not distinguish between mountpoints and symbolic links. // mkdir /tmp/a /tmp/b; mount --bind /tmp/a /tmp/b; IsLikelyNotMountPoint("/tmp/b") // will return true. When in fact /tmp/b is a mount point. If this situation // is of interest to you, don't use this function... func (mounter *Mounter) IsLikelyNotMountPoint(file string) (bool, error) { stat, err := os.Stat(file) if err != nil { return true, err } rootStat, err := os.Stat(filepath.Dir(strings.TrimSuffix(file, "/"))) if err != nil { return true, err } // If the directory has a different device as parent, then it is a mountpoint. if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev { return false, nil }
return true, nil
}
Reached Redhat they said we need to work with you to get it fixed as this is from netapp.
Shall you please share your scc for trident, oc describe scc/trident to check what capabilities you have?
Also please let us know how to enable below capabilities for trident-csi and trident-operator accounts in openshift?
hostbindmounts,privileged,kernelCapabilities,hostNetwork,hostIpc,hostpid
I think the trident scc does not have adequate privileges to create mkdir on the pod to mount the pvc ["downwardAPI","emptyDir","hostPath","projected"], we need to have "persistentVolumeClaim".
Please let me know how to add the "persistentVolumeClaim" in trident scc.
I try to add but whenever the trident-csi plugin try to mount it the "persistentVolumeClaim" gone from trident scc
@eselvam When Trident is installed on OCP, SCC is also created as part of the installation, can you please check if there is an SCC or not by running oc get scc command to look for trident SCC.
There is a SCC with name trident which does not have volume capabilities named persistentVolumeClaim.
I think due to that it is not able to create mount based on the claim.
Other mounts such as configmap are working fine because it is available in the trident scc.
Note: We use User provisioned openshift install on physical server.
I believe Trident sets volume capabilities to:
volumes:
- '*'
It should include everything, out of curiosity what do you see in the output of the command oc get scc trident -o yaml?
oc get scc trident -o yaml allowHostDirVolumePlugin: true allowHostIPC: true allowHostNetwork: true allowHostPID: true allowHostPorts: true allowPrivilegeEscalation: true allowPrivilegedContainer: true allowedCapabilities:
- '*' allowedUnsafeSysctls:
- '*'
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups: []
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: trident is a clone of the privileged built-in, and
is meant just for use with trident.
creationTimestamp: "2022-08-08T17:45:23Z"
generation: 8
labels:
app: controller.csi.trident.netapp.io
k8s_version: v1.23.3
trident_version: v22.07.0
name: trident
ownerReferences:
- apiVersion: trident.netapp.io/v1 controller: true kind: TridentOrchestrator name: trident uid: d8516cf9-7932-40cb-882b-2341b4e97ce6 resourceVersion: "44565962" uid: 78900f92-c51d-4803-a76e-55530d1bc18f priority: null readOnlyRootFilesystem: false requiredDropCapabilities: null runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny supplementalGroups: type: RunAsAny users:
- system:serviceaccount:trident:trident-csi volumes:
- downwardAPI
- emptyDir
- hostPath
- projected
I have above only. If I change, it is overwritten by csi.
@eselvam thank you for sharing this! I would like to remind you that this is not a channel for support. I would advise to let support continue to diagnose your issue and allow them to make the required recommendations!
sure. Thanks. so, it is something with operator which creates the scc in version 22.07?
quick question: is securityContext changed in 22.07.0 trident operator? the below not exist in 22.01.1
securityContext: fsGroup: 1000670000 seLinuxOptions: level: s0:c26,c10
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000670000
This issue should be addressed here. It seem the netapp trident operator has bug. The version 22.07 does not have below scc hence it is failed to mount pvc in pods however in 22.01.1 it is *. Please fix the code. Thanks.
tridentctl version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 22.01.1 | 22.01.1 | +----------------+----------------+ scc from 22.01.1 version
trident true [""] RunAsAny RunAsAny RunAsAny RunAsAny
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES
anyuid false
note under volumes it is *. It is bug in 22.07. please fix it.
Hi @eselvam,
We verified again that Trident v22.07 is able to mount a NAS volume when using OCP 4.10. We do not think the mount issue that you were experiencing was related to an SCC permissions issue. If you are continuing to encounter this issue please open a NetApp support case.