bcachefs
bcachefs copied to clipboard
"Remove failed" error when removing failed device
uname -a
Linux malu 5.13.19-1-bcachefs-git-354636-gc85e27c45512 #1 SMP PREEMPT Thu, 07 Oct 2021 22:20:07 +0000 x86_64 GNU/Linux
bcachefs version
bcachefs tool version v0.1-366-g3785043
As explained in #320 I have a bcachefs filesystem with multiple devices in which one of the devices has failed. I've attempted to remove it with bcachefs device remove
and previously encountered a kernel bug which has been fixed as of c85e27c.
However now when I attempt to remove the device, I am encountering other errors (including at one point a kernel bug when I tried to unmount the filesystem after the device failed to be removed from a previous command).
In particular, I did the following:
-
bcachefs unlock <device path>
-
mount -t bcachefs -o rw,very_degraded <devices> <mount point>
This seems to work as expected:[ 164.658939] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): journal read done, 0 keys in 1 entries, seq 47303529 [ 184.191998] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): going read-write [ 184.352568] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): mounted with opts: metadata_replicas=2,data_replicas=2,foreground_target=nvme,background_target=hdd,promote_target=nvme,noinodes_32bit,noshard_inode_numbers,noinodes_use_key_cache,very_degraded
-
bcachefs device remove -f 2 <mount point>
(device 2 is the one that is missing, according tobcachefs fs usage
) This results in the following errors in the kernel log:[ 227.772439] bcachefs (dev-2): btree write error: device removed [ 227.783574] bcachefs (dev-2): btree write error: device removed [ 228.603379] bcachefs (dev-2): btree write error: device removed [ 228.764879] bcachefs (dev-2): btree write error: device removed [ 228.788595] bcachefs (dev-2): btree write error: device removed [ 228.794426] bcachefs (dev-2): btree write error: device removed [ 228.838876] bcachefs (dev-2): btree write error: device removed [ 228.853879] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): invalid bkey u64s 5 type deleted 4346:8:U32_MAX len 8 ver 73580037 on insert from __bch2_dev_usrdata_drop [bcachefs] -> __bch2_dev_usrdata_drop [bcachefs]: nonzero size field [ 228.859658] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): emergency read only [ 228.861161] bcachefs (dev-2): Remove failed: error -22 dropping data [ 228.861166] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): fatal error writing btree node
-
bcachefs device remove -f /dev/sda1
(This is another device which is not failed, but which I wanted to remove so that I could use it to transfer data from the bcachefs filesystem. I think the error is simply because there is still some btree data on it.)[ 253.441159] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): Error updating btree node key: -30 [ 253.443456] bcachefs (sda1): Remove failed: error -30 dropping data
-
umount <mount point>
This caused a kernel bug:[ 273.522498] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 273.525821] #PF: supervisor read access in kernel mode [ 273.528624] #PF: error_code(0x0000) - not-present page [ 273.531352] PGD 0 P4D 0 [ 273.533987] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 273.536609] CPU: 0 PID: 1171 Comm: umount Tainted: G OE 5.13.19-1-bcachefs-git-354636-gc85e27c45512 #1 [ 273.539291] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019 [ 273.541989] RIP: 0010:bch2_journal_space_available+0x184/0x480 [bcachefs] [ 273.544715] Code: f9 74 2d 4c 8b 96 e8 0b 00 00 eb 15 8d 47 01 31 d2 41 f7 f0 89 d7 89 96 fc 0b 00 00 44 39 ca 74 0f 89 f8 49 8b 95 58 13 00 00 <49> 39 14 c2 72 dc 8b 96 f8 0b 00 00 39 fa 74 2c 4c 8b 8e e8 0b 00 [ 273.550273] RSP: 0018:ffffb7e08ffa7d48 EFLAGS: 00010202 [ 273.553032] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000007 [ 273.555788] RDX: 0000000002d1cb81 RSI: ffff9ee7dfcf1000 RDI: 0000000000000003 [ 273.558523] RBP: 0000000000000000 R08: 0000000000002000 R09: 0000000000000013 [ 273.561245] R10: 0000000000000000 R11: ffff9ee89ec0b100 R12: 0000000000000001 [ 273.563944] R13: ffff9ee7de414468 R14: ffff9ee7de403980 R15: ffff9ee928134800 [ 273.566644] FS: 00007f465414d740(0000) GS:ffff9ef6bea00000(0000) knlGS:0000000000000000 [ 273.569338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 273.571999] CR2: 0000000000000018 CR3: 000000047735c000 CR4: 0000000000350ef0 [ 273.574669] Call Trace: [ 273.577284] bch2_journal_pin_drop+0x124/0x130 [bcachefs] [ 273.579914] bch2_fs_btree_cache_exit+0x2df/0x340 [bcachefs] [ 273.582515] bch2_fs_release+0x8e/0x2a0 [bcachefs] [ 273.585083] kobject_put+0x86/0x1d0 [ 273.587582] deactivate_locked_super+0x36/0xa0 [ 273.590048] cleanup_mnt+0x131/0x190 [ 273.592487] task_work_run+0x5c/0x90 [ 273.594853] exit_to_user_mode_prepare+0x16b/0x170 [ 273.597163] syscall_exit_to_user_mode+0x23/0x50 [ 273.599453] do_syscall_64+0x6e/0x80 [ 273.601713] ? exc_page_fault+0x78/0x180 [ 273.603953] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 273.606174] RIP: 0033:0x7f46542d261b [ 273.608403] Code: 18 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 21 18 0c 00 f7 d8 [ 273.613046] RSP: 002b:00007fff46086748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 273.615369] RAX: 0000000000000000 RBX: 00007f46543ff264 RCX: 00007f46542d261b [ 273.617666] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055aa9bd71e10 [ 273.619915] RBP: 000055aa9bd6d580 R08: 0000000000000000 R09: 00007fff460854c0 [ 273.622147] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 273.624307] R13: 000055aa9bd71e10 R14: 000055aa9bd6d690 R15: 000055aa9bd71dd0 [ 273.626404] Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc 88XXau(OE) snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device cfg80211 mc intel_rapl_msr joydev mousedev intel_rapl_common edac_mce_amd snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel amdgpu nouveau kvm snd_intel_dspcfg gpu_sched snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm crct10dif_pclmul snd_hda_core crc32_pclmul drm_kms_helper ghash_clmulni_intel nls_iso8859_1 snd_hwdep snd_pcm aesni_intel snd_timer vfat cec fat igb crypto_simd sp5100_tco syscopyarea cryptd usbhid rapl pcspkr k10temp ccp snd i2c_algo_bit sysfillrect i2c_piix4 rng_core i2c_nvidia_gpu sysimgblt fb_sys_fops [ 273.626443] soundcore dca gpio_amdpt gpio_generic pinctrl_amd mac_hid acpi_cpufreq eeepc_wmi asus_wmi sparse_keymap rfkill video wmi_bmof mxm_wmi vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) sg drm asus_wmi_sensors(OE) wmi fuse agpgart bpf_preload ip_tables x_tables ext4 crc16 mbcache jbd2 uas usb_storage xhci_pci xhci_pci_renesas bcachefs libcrc32c crc32c_generic crc32c_intel xor crc64 raid6_pq vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio [ 273.649848] CR2: 0000000000000018 [ 273.652394] ---[ end trace b0767c5729e86981 ]--- [ 273.654889] RIP: 0010:bch2_journal_space_available+0x184/0x480 [bcachefs] [ 273.657367] Code: f9 74 2d 4c 8b 96 e8 0b 00 00 eb 15 8d 47 01 31 d2 41 f7 f0 89 d7 89 96 fc 0b 00 00 44 39 ca 74 0f 89 f8 49 8b 95 58 13 00 00 <49> 39 14 c2 72 dc 8b 96 f8 0b 00 00 39 fa 74 2c 4c 8b 8e e8 0b 00 [ 273.662492] RSP: 0018:ffffb7e08ffa7d48 EFLAGS: 00010202 [ 273.665059] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000007 [ 273.667656] RDX: 0000000002d1cb81 RSI: ffff9ee7dfcf1000 RDI: 0000000000000003 [ 273.670259] RBP: 0000000000000000 R08: 0000000000002000 R09: 0000000000000013 [ 273.672851] R10: 0000000000000000 R11: ffff9ee89ec0b100 R12: 0000000000000001 [ 273.675407] R13: ffff9ee7de414468 R14: ffff9ee7de403980 R15: ffff9ee928134800 [ 273.677956] FS: 00007f465414d740(0000) GS:ffff9ef6bea00000(0000) knlGS:0000000000000000 [ 273.680526] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 273.683075] CR2: 0000000000000018 CR3: 000000047735c000 CR4: 0000000000350ef0 [ 273.685644] note: umount[1171] exited with preempt_count 1
I pushed some fixes - can you retest and see if that did it?