SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart

Open YiteGu opened this issue 11 months ago • 2 comments

Describe the bug

rbd-nbd process running in csi-rbdplugin pod if use rbd-nbd mounter. SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart, this issues cause to rbd watcher no graceful release. This leads to healerStageTransaction can't no fast completed, further leads to I/O interrupt 30s.

Environment details

Image/version of Ceph CSI driver : v3.9.0
Helm chart version :
Kernel version :
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : rbd-nbd
Kubernetes cluster version :
Ceph cluster version :

Steps to reproduce

Steps to reproduce the behavior:

create a test pod with a pvc, and SC use rbd-nbd mounter
use fio test filesystem of pvc mount
restart csi-rbdplugin pod

Actual results

Expected behavior

IO interrupt quick recovery

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in PVC resize please attach complete logs of below containers.

csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in snapshot creation and deletion please attach complete logs of below containers.

csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in PVC mounting please attach complete logs of below containers.

csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from plugin pod from the node where the mount is failing.
if required attach dmesg logs.

Note:- If its a rbd issue please provide only rbd related logs, if its a cephFS issue please provide cephFS logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

Jan 10 '25 08:01 YiteGu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Feb 09 '25 21:02 github-actions[bot]

any update on this ? On my view this should be considerer blocking to have rbd-nbd GA for production. Write availability should not depend so closely on csi-rbdplugin being running

Sep 01 '25 08:09 julienlau