bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

kernel stuck during online fsck

Open KrzysztofHajdamowicz opened this issue 3 months ago • 2 comments

Got such dmesg:

[Sat Sep 27 23:09:49 2025] INFO: task events_unbound:1707136 blocked for more than 122 seconds.
[Sat Sep 27 23:09:49 2025]       Tainted: P S         OE       6.17.0-202509131122-pve #1
[Sat Sep 27 23:09:49 2025]       Blocked by coredump.
[Sat Sep 27 23:09:49 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Sep 27 23:09:49 2025] task:events_unbound  state:D stack:0     pid:1707136 tgid:1707131 ppid:1633543 task_flags:0x40044c flags:0x00004006
[Sat Sep 27 23:09:49 2025] Call Trace:
[Sat Sep 27 23:09:49 2025]  <TASK>
[Sat Sep 27 23:09:49 2025]  __schedule+0x45a/0x14a0
[Sat Sep 27 23:09:49 2025]  schedule+0x27/0xf0
[Sat Sep 27 23:09:49 2025]  schedule_timeout+0xcf/0x110
[Sat Sep 27 23:09:49 2025]  __wait_for_common+0x95/0x1b0
[Sat Sep 27 23:09:49 2025]  ? __pfx_schedule_timeout+0x10/0x10
[Sat Sep 27 23:09:49 2025]  wait_for_completion+0x24/0x40
[Sat Sep 27 23:09:49 2025]  kthread_stop+0x6d/0x190
[Sat Sep 27 23:09:49 2025]  bch2_thread_with_file_exit+0x1a/0x80 [bcachefs]
[Sat Sep 27 23:09:49 2025]  thread_with_stdio_release+0x4b/0xc0 [bcachefs]
[Sat Sep 27 23:09:49 2025]  __fput+0xea/0x2d0
[Sat Sep 27 23:09:49 2025]  ____fput+0x15/0x20
[Sat Sep 27 23:09:49 2025]  task_work_run+0x5d/0xa0
[Sat Sep 27 23:09:49 2025]  do_exit+0x2ad/0xa20
[Sat Sep 27 23:09:49 2025]  do_group_exit+0x34/0x90
[Sat Sep 27 23:09:49 2025]  get_signal+0x833/0x880
[Sat Sep 27 23:09:49 2025]  arch_do_signal_or_restart+0x41/0x260
[Sat Sep 27 23:09:49 2025]  exit_to_user_mode_loop+0x91/0x170
[Sat Sep 27 23:09:49 2025]  do_syscall_64+0x209/0xc70
[Sat Sep 27 23:09:49 2025]  ? x64_sys_call+0x178d/0x2330
[Sat Sep 27 23:09:49 2025]  ? do_syscall_64+0xb8/0xc70
[Sat Sep 27 23:09:49 2025]  ? exit_to_user_mode_loop+0xe6/0x170
[Sat Sep 27 23:09:49 2025]  ? do_syscall_64+0x211/0xc70
[Sat Sep 27 23:09:49 2025]  ? count_memcg_events+0xd7/0x1a0
[Sat Sep 27 23:09:49 2025]  ? handle_mm_fault+0x254/0x370
[Sat Sep 27 23:09:49 2025]  ? do_user_addr_fault+0x2f8/0x830
[Sat Sep 27 23:09:49 2025]  ? irqentry_exit_to_user_mode+0x2e/0x270
[Sat Sep 27 23:09:49 2025]  ? irqentry_exit+0x43/0x50
[Sat Sep 27 23:09:49 2025]  ? exc_page_fault+0x90/0x1b0
[Sat Sep 27 23:09:49 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Sat Sep 27 23:09:49 2025] RIP: 0033:0x7f883e039779
[Sat Sep 27 23:09:49 2025] RSP: 002b:00007f883dee18a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Sat Sep 27 23:09:49 2025] RAX: fffffffffffffe00 RBX: 000063998152ef58 RCX: 00007f883e039779
[Sat Sep 27 23:09:49 2025] RDX: 0000000000000001 RSI: 0000000000000080 RDI: 000063998152efb0
[Sat Sep 27 23:09:49 2025] RBP: 000063998152ef90 R08: 0000000000000000 R09: 0000000000000000
[Sat Sep 27 23:09:49 2025] R10: 0000000000000000 R11: 0000000000000246 R12: 000063998152ef40
[Sat Sep 27 23:09:49 2025] R13: 000063998152ef58 R14: 00007ffde2f3a9c0 R15: 00007f883deda000
[Sat Sep 27 23:09:49 2025]  </TASK>

as a result of executing online fsck:

❯ bcachefs fsck /dev/sde:/dev/sda1:/dev/sdb1:/dev/sdc1:/dev/sdd1:/dev/nvme1n1p1:/dev/nvme0n1p1
Running fsck online
bcachefs (20b15bfd-e996-4f45-8ab6-07b15bd9bae7): check_alloc_info...^C
^C
^C^C^C^C^C^C^C

bcachefs 1.31.4 as DKMS for kernel 6.17-rc5 running on Debian Trixie

KrzysztofHajdamowicz avatar Sep 27 '25 21:09 KrzysztofHajdamowicz

we need to be checking for kthread_should_stop()

koverstreet avatar Sep 27 '25 21:09 koverstreet

Additionally online fsck should probably be automatically interrupted on FS going RO/into shutdown.

himikof avatar Oct 17 '25 23:10 himikof