ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart

Open YiteGu opened this issue 11 months ago • 2 comments

Describe the bug

rbd-nbd process running in csi-rbdplugin pod if use rbd-nbd mounter. SIGTERM no reach to rbd-nbd process when csi-rbdplugin pod restart, this issues cause to rbd watcher no graceful release. This leads to healerStageTransaction can't no fast completed, further leads to I/O interrupt 30s.

Environment details

  • Image/version of Ceph CSI driver : v3.9.0
  • Helm chart version :
  • Kernel version :
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : rbd-nbd
  • Kubernetes cluster version :
  • Ceph cluster version :

Steps to reproduce

Steps to reproduce the behavior:

  1. create a test pod with a pvc, and SC use rbd-nbd mounter
  2. use fio test filesystem of pvc mount
  3. restart csi-rbdplugin pod

Actual results

Image

Expected behavior

IO interrupt quick recovery

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

  • csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in PVC resize please attach complete logs of below containers.

  • csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in snapshot creation and deletion please attach complete logs of below containers.

  • csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from plugin pod from the node where the mount is failing.

  • if required attach dmesg logs.

Note:- If its a rbd issue please provide only rbd related logs, if its a cephFS issue please provide cephFS logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

YiteGu avatar Jan 10 '25 08:01 YiteGu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 09 '25 21:02 github-actions[bot]

any update on this ? On my view this should be considerer blocking to have rbd-nbd GA for production. Write availability should not depend so closely on csi-rbdplugin being running

julienlau avatar Sep 01 '25 08:09 julienlau