btrfs-progs icon indicating copy to clipboard operation
btrfs-progs copied to clipboard

Segmentation fault with `btrfs check`

Open MaxG87 opened this issue 3 years ago • 1 comments

I have a file system that I believe is corrupted. The file system is hosted inside a LUKS container. I got tons of errors but I think I was able to get rid of most of them using btrfs balance -dusage=95 start. I started a btrfs balance without any filter but aborted it by pressing CTRL-C once and then waiting.

I compiled btrfs-progs from sources (exactly tag v6.0.1).

When I reran btrfs check to check whether the filesystem is already free of errors, I noticed that it segfaults. The output up to the segfault is attached.

$ sudo ./btrfs check /dev/mapper/schwarz-crypt
Opening filesystem to check...
Checking filesystem on /dev/mapper/schwarz-crypt
UUID: 8bc94163-ab09-4740-96f9-a01b9c451ef2
[1/7] checking root items
[2/7] checking extents
parent transid verify failed on 2648112087040 wanted 2688 found 2668
parent transid verify failed on 2648112087040 wanted 2688 found 2668
parent transid verify failed on 2648112087040 wanted 2688 found 2668
Ignoring transid failure
parent transid verify failed on 2648112349184 wanted 2688 found 2669
parent transid verify failed on 2648112349184 wanted 2688 found 2669
parent transid verify failed on 2648112349184 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112365568 wanted 2688 found 2669
parent transid verify failed on 2648112365568 wanted 2688 found 2669
parent transid verify failed on 2648112365568 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112381952 wanted 2688 found 2669
parent transid verify failed on 2648112381952 wanted 2688 found 2669
parent transid verify failed on 2648112381952 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112398336 wanted 2688 found 2669
parent transid verify failed on 2648112398336 wanted 2688 found 2669
parent transid verify failed on 2648112398336 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112414720 wanted 2688 found 2669
parent transid verify failed on 2648112414720 wanted 2688 found 2669
parent transid verify failed on 2648112414720 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112463872 wanted 2688 found 2669
parent transid verify failed on 2648112463872 wanted 2688 found 2669
parent transid verify failed on 2648112463872 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648115920896 wanted 2688 found 2669
parent transid verify failed on 2648115920896 wanted 2688 found 2669
parent transid verify failed on 2648115920896 wanted 2688 found 2669
Ignoring transid failure
parent transid verify failed on 2648112136192 wanted 2688 found 2669
parent transid verify failed on 2648112136192 wanted 2688 found 2669
parent transid verify failed on 2648112136192 wanted 2688 found 2669
Ignoring transid failure
[1]    39728 segmentation fault  sudo ./btrfs check /dev/mapper/schwarz-crypt

EDIT: I ran the check with gdb and got something that might be useful:

Program received signal SIGSEGV, Segmentation fault.
calc_extent_flag (extent_cache=extent_cache@entry=0x7fffffffdb60, buf=buf@entry=0x55555c0404e0, ri=ri@entry=0x0, flags=flags@entry=0x7fffffffd8a8) at check/main.c:6060
6060            if (ri->objectid < BTRFS_FIRST_FREE_OBJECTID)
(gdb) backtrace
#0  calc_extent_flag (extent_cache=extent_cache@entry=0x7fffffffdb60, buf=buf@entry=0x55555c0404e0, ri=ri@entry=0x0,
    flags=flags@entry=0x7fffffffd8a8) at check/main.c:6060
#1  0x00005555555cda2b in run_next_block (root=root@entry=0x555555bf1cc0, bits=bits@entry=0x555556b2c5d0,
    bits_nr=bits_nr@entry=1024, last=last@entry=0x7fffffffda38, pending=pending@entry=0x7fffffffdb70,
    seen=seen@entry=0x7fffffffdb68, reada=0x7fffffffdb78, nodes=0x7fffffffdb80, extent_cache=0x7fffffffdb60,
    chunk_cache=0x7fffffffdb58, dev_cache=0x7fffffffdb50, block_group_cache=0x7fffffffdc70,
    dev_extent_cache=0x7fffffffdc10, ri=0x0) at check/main.c:6238
#2  0x00005555555cf67c in deal_root_from_list (list=list@entry=0x7fffffffdba0, root=root@entry=0x555555bf1cc0,
    bits=bits@entry=0x555556b2c5d0, bits_nr=bits_nr@entry=1024, pending=pending@entry=0x7fffffffdb70,
    seen=seen@entry=0x7fffffffdb68, reada=0x7fffffffdb78, nodes=0x7fffffffdb80, extent_cache=0x7fffffffdb60,
    chunk_cache=0x7fffffffdb58, dev_cache=0x7fffffffdb50, block_group_cache=0x7fffffffdc70, dev_extent_cache=0x7fffffffdc10)
    at check/main.c:8601
#3  0x00005555555d0064 in check_chunks_and_extents () at check/main.c:8918
#4  do_check_chunks_and_extents () at check/main.c:9014
#5  0x00005555555d5604 in cmd_check (cmd=0x555555643620 <cmd_struct_check>, argc=<optimized out>, argv=<optimized out>)
    at check/main.c:10334
#6  0x000055555556c25d in cmd_execute (argv=0x7fffffffe4d0, argc=2, cmd=0x555555643620 <cmd_struct_check>)
    at cmds/commands.h:125
#7  main (argc=2, argv=0x7fffffffe4d0) at btrfs.c:408

MaxG87 avatar Nov 10 '22 19:11 MaxG87

I see a similar crash. Recompiling with ASAN yields nothing useful beyond the trace on gdb.

ri is null at the call site, calc_extent_flag, which receives a call all the way from deal_root_from_list through run_next_block which indeed initializes ri to NULL (https://github.com/kdave/btrfs-progs/blob/devel/check/main.c#L8781). So at that point this crash seems inevitable?

It seems that if the run_next_block call above, on https://github.com/kdave/btrfs-progs/blob/devel/check/main.c#L8766, ever returns something positive this will happen? As in, say, https://github.com/kdave/btrfs-progs/blame/devel/check/main.c#L6332

jisakiel avatar Feb 06 '25 14:02 jisakiel