ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

rbd: no option to cancel the stuck rbd force promote operation

Open Madhu-1 opened this issue 3 years ago • 5 comments

Problem:- During the failover operation, the volume replication tries image promote action to make the rbd image as primary if the promote operation fails it calls promote again with force operation. In some cases, the force promote hangs indefinitely and never returns back because we are using the go-ceph API and there is no step to cancel the ongoing operations and the only option to get out of it is to restart the rbd provisioner pod. the indefinite hang might be due to the bug in RBD (still investigation is going on)

Workaround:-

The force promote operation should be executed with a timeout so that the command never gets hang and follow-up API calls can force promote the volume.

similar issues:- https://github.com/ceph/ceph-csi/issues/553

upstream ceph tracker: https://tracker.ceph.com/issues/52913 https://bugzilla.redhat.com/show_bug.cgi?id=2030752

Madhu-1 avatar Dec 22 '21 03:12 Madhu-1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 26 '22 21:02 github-actions[bot]

Moving it out for 3.6 as a fix for this is not available in ceph yet.

Madhu-1 avatar Mar 30 '22 14:03 Madhu-1

removed from the milestone tracker.

humblec avatar Apr 01 '22 05:04 humblec

@Madhu-1 shall we move this from 3.7 too ?

humblec avatar Jun 16 '22 05:06 humblec

Moving out of 3.7.0 release.

humblec avatar Aug 18 '22 05:08 humblec