btrfs-progs
btrfs-progs copied to clipboard
Something's fishy about btrfs-restore "Too many loops"
I encountered the prompt "We seem to be looping a lot." Not only is this prompt unclear and undocumented, but, having looked at the code, I believe there is something illogical about the whole thing. Why is 1024 a magic number of iterations that indicates that we're not making progress? Isn't there a better way to determine whether we're making progress? This prompt appears all the time when copying large files, even if we're making progress. But the cherry on top is the response "a" (also undocumented! I'd guessed it stood for "abort"), which is a semi-permanent "y" (only for the given file). Won't it land us in an infinite loop in case we don't actually make progress?
Not only is this whole loops business flawed, but it's also very inconvenient to babysit a big restore that takes hours, constantly answering this prompt.
Exactly! I've experienced an accidental power lose days ago and the whole 10TB btrfs partition was damaged and wont mount. I've tried all the options including various command in btrfs rescue
and spent a whole day recovering chunks. Unfortunately it still wont mount!!!
Finally 'btrfs restore' became my last option, but i encountered "looping a lot" prompts at lease hundreds of times while no more than 1TB of original data were restored! I'm very confused on the y/N/a prompt. I was like you babysitting for hours to answer the prompt but seems I got the answer wrong! (I used default "N"). According to you issue, it seems that "a" is the correct answer! Which one should I use if I want a full recovery? Is there an option to suppress this prompt? What exactly do these options do? Thank you a lot!
By the way, are there more options or chances I can use to fix the btrfs partition? Most of the methods I tried are from https://lists.opensuse.org/opensuse/2017-02/msg00930.html and https://ownyourbits.com/2019/03/03/how-to-recover-a-btrfs-partition/
btrfs-progs v5.4.1
and Linux 5.4.0-14-generic
are being used.
What dmesg said when I was trying to mount with -o usebackuproot
[50690.385703] BTRFS warning (device sdd1): suspicious: generation < chunk_root_generation: 251259 < 251370
[50690.385709] BTRFS info (device sdd1): trying to use backup root at mount time
[50690.385711] BTRFS info (device sdd1): disk space caching is enabled
[50690.385712] BTRFS info (device sdd1): has skinny extents
[50690.401122] BTRFS critical (device sdd1): corrupt leaf: block=146046976 slot=188 extent bytenr=1100365824 len=16384 invalid generation, have 251261 expect (0, 251260]
[50690.401126] BTRFS error (device sdd1): block=146046976 read time tree block corruption detected
[50690.401333] BTRFS critical (device sdd1): corrupt leaf: block=146046976 slot=188 extent bytenr=1100365824 len=16384 invalid generation, have 251261 expect (0, 251260]
[50690.401335] BTRFS error (device sdd1): block=146046976 read time tree block corruption detected
[50690.401345] BTRFS error (device sdd1): failed to verify dev extents against chunks: -5
[50690.484106] BTRFS error (device sdd1): open_ctree failed
What btrfs check --readonly said
Opening filesystem to check...
Checking filesystem on /dev/sdd1
UUID: 8a1de060-b5a0-44c0-9027-8dd9c5413d2e
[1/7] checking root items
parent transid verify failed on 1927391723520 wanted 251249 found 251282
parent transid verify failed on 1927391723520 wanted 251249 found 251282
parent transid verify failed on 1927391723520 wanted 251249 found 251282
Ignoring transid failure
leaf parent key incorrect 1927391723520
ERROR: failed to repair root items: Operation not permitted
Again I thank you everyone in advance!
@Provissy Sorry, I know little about BTRFS to recommend the best recovery approach, but from reading the source code the btrfs-restore prompt currently means:
-
y
Loop another 1000 times to try to gather the pieces of a file. -
n
Give up on this file. Will leave the file partially restored. This option is, supposedly, sometimes necessary if we get stuck on a file. -
a
Keep looping to gather the pieces of a file and don't ask again for this file. This option, supposedly, could lead to getting stuck.
Until this issue is fixed, the easiest thing to do is to recompile the source code after editing the magic value 1024
inside cmds/restore.c
to something larger, like 50000. This way you'll get almost no prompts.
In Manjaro/Arch Linux this is of course very easy.
yay -G btrfs-progs
cd btrfs-progs
makepkg -so --skippgpcheck
# edit `src/btrfs-progs-v5.4/cmds/restore.c`
makepkg -e
yay -U btrfs-progs-5.4-1-x86_64.pkg.tar.xz
@Provissy Sorry, I know little about BTRFS to recommend the best recovery approach, but from reading the source code the btrfs-restore prompt currently means:
y
Loop another 1000 times to try to gather the pieces of a file.n
Give up on this file. Will leave the file partially restored. This option is, supposedly, sometimes necessary if we get stuck on a file.a
Keep looping to gather the pieces of a file and don't ask again for this file. This option, supposedly, could lead to getting stuck.Until this issue is fixed, the easiest thing to do is to recompile the source code after editing the magic value
1024
insidecmds/restore.c
to something larger, like 50000. This way you'll get almost no prompts.In Manjaro/Arch Linux this is of course very easy.
yay -G btrfs-progs cd btrfs-progs makepkg -so --skippgpcheck # edit `src/btrfs-progs-v5.4/cmds/restore.c` makepkg -e yay -U btrfs-progs-5.4-1-x86_64.pkg.tar.xz
@almson Oh thank you so much! It seems that no matter how big the file is, the loop will eventually come to the end. I used yes a | btrfs restore ...
and took 1.5 day to complete. Although only 60% of original files were restored. Newly written files are not being restored at all.
I'm trying some third party data recovery tools but seems none of them supporting zstd
decompression.
Hope this "genuine" btrfs restore tool could become more better!
"Too many Loops" is annoying , having to come back every few mins to do "a" seems to stop every 1gb ...
I'll reply here as it's the most recent comment on the restore looping problem. First I'm sorry this got unattended for so long, after reading all the issues the problem is worse than I thought. I've been reading the code and trying to figure out what was the looping trying to avoid, but so far haven't found a concrete issue. The code has been there since the beginning so there's no explanation why it's there. Right now I'm inclined to rip it out completely, adding back only a command line backed option for some severe cases.
The looping happens in two cases:
- too many loops over file extents, so that's the question that comes back repeatedly, a fragmented file can have a lot of extents spanning many leaves, so that's where the loop counter hits the limit and "tries to help"
- loops over leaves with directory items, so trees with some intermediate items other than those few subject to restoration (files, directories, symlinks) could hit the limit
Overall the feedback regarding the looping was negative so even if ther's a valid case, it's perhaps the exception we should care about, rather than the other way around.
As somebody suggested, increasing the loop limit count helps, but that's obviously only pushing the problem farther away and it could give an impression that no attention is needed. So in the next release I want to reverse it and let it continue without asking the user.
The looping in restore has been removed.
thank you , I would not have completely removed it as their might be a bad sector causing it , but maybe make a variable user can set via command line to increase the count or have a skip function that will log the files to try again after all other files have been done
I got most of my data back with only a small amount of corrupt files , possibly the ones that that restore died on from the skipping it got to the stage I just ran a key press command over night
but it would be interesting to run another restore and compare the files between the versions
I have not found a valid condition for the looping in the code. The b-tree leaves are scanned linearly, so any problem reading the leaves would stop and not loop. A logical loop, like one item farther in the sequence pointing back so that the iteration would go over the same leaves is not possible with the file extents, remotely possible with the directory items (as some of them could contain references to other items).
I think that crafting a not entirely valid btrfs tree but with valid items with invalid logical structure could cause looping. But the most common case is a filesystem that's modified by kernel and thus we can assume the structures are in the average case damaged, eg. missing, rather than seemingly correct but invalid.
I have a complete image of the harddisk and still have the 3tb drive in its bay so if their is anything I can do let me know
in my case something drastically went wrong with my btrfs and it would no longer mount , recovery programs like "UFS Explorer Professional" could see the correct directory tree and if I had a full version I am sure it would have been successful but ubuntu would try opening it with an old partition layout I have another 2x drives dong the same thing but are still at least mounting so I will have to get their data swapped over to fresh drives soon
I believe the culprit is from power failures
Is there an automatic y for the btrfs restore
command? Also can't find any documentation anywhere that explains what option "a" means when you get y/N/a returned while attempting to do a restore.
I am more lazy, I just did it via ssh and setup autohotkey in windows to push the "n" key and enter once in a while.
#Persistent SetTimer, PressTheKey, 60000 Return
PressTheKey: Send, n`n Return
The looping and input code has been removed so I don't think this has any effect.