SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart
Describe the bug
rbd-nbd process running in csi-rbdplugin pod if use rbd-nbd mounter. SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart, this issues cause to rbd watcher no graceful release. This leads to healerStageTransaction can't no fast completed, further leads to I/O interrupt 30s.
Environment details
- Image/version of Ceph CSI driver : v3.9.0
- Helm chart version :
- Kernel version :
- Mounter used for mounting PVC (for cephFS its
fuseorkernel. for rbd itskrbdorrbd-nbd) : rbd-nbd - Kubernetes cluster version :
- Ceph cluster version :
Steps to reproduce
Steps to reproduce the behavior:
- create a test pod with a pvc, and SC use rbd-nbd mounter
- use fio test filesystem of pvc mount
- restart csi-rbdplugin pod
Actual results
Expected behavior
IO interrupt quick recovery
Logs
If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.
- csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.
If the issue is in PVC resize please attach complete logs of below containers.
- csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.
If the issue is in snapshot creation and deletion please attach complete logs of below containers.
- csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.
If the issue is in PVC mounting please attach complete logs of below containers.
-
csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from plugin pod from the node where the mount is failing.
-
if required attach dmesg logs.
Note:- If its a rbd issue please provide only rbd related logs, if its a cephFS issue please provide cephFS logs.
Additional context
Add any other context about the problem here.
For example:
Any existing bug report which describe about the similar issue/behavior
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
any update on this ? On my view this should be considerer blocking to have rbd-nbd GA for production. Write availability should not depend so closely on csi-rbdplugin being running