Cannot cancel device removal
# btrfs dev rem /dev/sdg1 /mnt/x &
…
# btrfs dev rem cancel /mnt/x
Request to cancel running device deletion
ERROR: error removing device 'cancel': Operation canceled
#
The removal continues unabated.
You can just kill the process. I don't know why btrfs device remove cancel is there at all.
Maybe it's easier than grepping through "ps" output? esp. when there's more than one btrfs in the system and the job uses a relative path …
NB there are some file system- and device-level operations, where killing things does not affect the operation. e.g. when you move a volume with lvm. Thus I would not expect killing the job to actually do anything.
Well, if the job does not background itself (as with lvm or scrubbing) my expectation is, that it actually (gracefully) stops its operation when it gets killed (gracefully).
You can just kill the process. I don't know why
btrfs device remove cancelis there at all.
You know a process can not be killed by signal if it's trapped in kernel space? As the signal handling is happening in user space.
It's only working because inside those ioctls we explicitly check the pending signals, and even with that checks, it only works for fatal ones.
The removal continues unabated.
Any dmesg? And kernel version?
@adam900710 Fair point. Are you implying that sending SIGTERM/SIGINT to a device remove is generally unsafe?
SIGTERM/SIGINT just won't do anything. Scrub (dev-replace is reusing scrub path) and balance only checks fatal signal, only SIGKILL counts.
So that's why we have ioctls to cancel/pause dev-replace/scrub/relocation.
SIGTERM/SIGINT just won't do anything. Scrub (dev-replace is reusing scrub path) and balance only checks fatal signal, only SIGKILL counts.
A scrub command starts the scrubbing and returns immediately. So there is basically nothing where to send a signal to. But btrfs device remove stays active:
btrfs dev rem /dev/sdg1 /mnt/x
^C (SIGINT)
So (unless stuck in kernel space), this should gracefully cancel the device removal? Does it? At least in my cases it seemed to work this way.
(Sorry for capturing/diverting from the issue)
Scrub (dev-replace is reusing scrub path) and balance only checks fatal signal, only SIGKILL counts.
Balance can be cancelled by Ctrl-C, ie SIGINT, but it's indeed only checking SIGKILL (fatal_signal_pending), I'm puzzled.
Scrub (dev-replace is reusing scrub path) and balance only checks fatal signal, only SIGKILL counts.
Balance can be cancelled by Ctrl-C, ie SIGINT, but it's indeed only checking SIGKILL (fatal_signal_pending), I'm puzzled.
It looks like it's wait_one_bit() inside btrfs_relocate_block_group(), which has TASK_INTERRUPITABLE.
And the real wait function is bit_wait(), which checks any pending signal, not only the fatal ones.