bupstash icon indicating copy to clipboard operation
bupstash copied to clipboard

Possible corruption detected

Open andrewchambers opened this issue 3 years ago • 23 comments

From the public chat:

after bupstash put, I am running a bupstash get to check the data. Here, I am getting the following error: bupstash get: entry path/to/some/file content hash differs from index hash, possible corruption detected . 

This issue is interesting because the expected symptom of disk or ram corruption would be a different error such a corrupt chunk error. This error is triggered during the second set of checksum checks - which suggests either a bug in bupstash or some other problem - it definitely requires further investigation.

andrewchambers avatar Nov 04 '22 01:11 andrewchambers

I just saw something similar. Running bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n id=7c7e8ea84a30f889ef6fa27bbbb8d4c7, I see the message:

bupstash get: entry 0 content hash differs from index hash, possible corruption detected.

The output is way smaller than I expect: 1.3K vs 3.6M when I archive the original directory with the tar command.

Before running into this error, I experienced another weird thing. Trying to recover an individual file, I got output that was exactly the right length, but had the wrong content. The content appeared to be pieces of other files concatenated together, starting partway through one file, then including two other complete files, ending exactly on a file boundary. The files were all from the same directory as the file I wanted. Command I used was bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n/ssh_keys id=7c7e8ea84a30f889ef6fa27bbbb8d4c7

More notes:

  • That directory (.../misc-pri/n) contains 2722 small text files.
  • This is bupstash-0.12.0 on OpenBSD.
  • I'm doing something weird with permissions: running bupstash put as a separate user in order to avoid accidentally archiving certain files like private keys, but then running bupstash get as my normal user. Also, I created the archive using a subkey that only has put permissions.
  • This is my first time trying bupstash, so it could easily be something about my setup.

falsifian avatar Mar 22 '23 23:03 falsifian

Thank you so much for the report - Your setup sounds fine.

A few questions:

  • Are you able to recover the file using a full 'get' of the snapshot?
  • Are you able to reproduce the error reliably?
  • Did you install bupstash from an openbsd package repository? I am curious if they applied any patches.

andrewchambers avatar Mar 22 '23 23:03 andrewchambers

Thank you so much for the report - Your setup sounds fine.

A few questions:

* Are you able to recover the file using a full 'get' of the snapshot?

Probably not. I just tried: bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar, and when I tried extracting the tarball, got:

moth d $ tar -xf ../bs.tar
tar: Invalid header, starting valid header search.
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.

and more like the above. (I terminated tar instead of waiting to see the end result; I can do that if it would help.)

* Are you able to reproduce the error reliably?

I'll try.

* Did you install bupstash from an openbsd package repository? I am curious if they applied any patches.

Yes. If I'm correctly interpreting the PATCHFILES lines in the Makefile, they're cherry-picking commits a73fd83, 8bf5bd1 and 290768c onto the v0.12.0 tag. (Seeing the comment about packporting a --ignore-permission-errors fix, I remembered a weird thing, in case it's relevant: while bupstash put was running, the progress indicator froze on a directory I didn't have permission to access.)

falsifian avatar Mar 23 '23 01:03 falsifian

Probably not. I just tried: bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar, and when I tried extracting the tarball, got:

I stand corrected. I just reran the tar command to see what happens if I let it finish extracting. That whole directory was restored correctly.

Another kind of error that appeared while tar was running:

tar: A hard link to the directory home/falsifian/src/darcs/2.16.2/_darcs/pristine.hashed/ is not allowed

falsifian avatar Mar 23 '23 01:03 falsifian

I ran a diff between my home directory within what tar extracted and my original home dir*. The only output I see is:

  • A lot of "Only in [original location]", indicating files were not extracted.
  • One instance of: File ./.cabal/bin/n is a directory while file /home/falsifian/.cabal/bin/n is a regular file.
  • Things that aren't surprising:
    • A bunch of "No such file or directory" complaints which I think are explained by being symlinks that diff is trying to follow to files that don't exist.
    • Binary files ./var/pri/bupstash/sendlog and /home/falsifian/var/pri/bupstash/sendlog differ

*I actually backed up / but only looked at /home/falsifian within the backup for this diff.

falsifian avatar Mar 23 '23 01:03 falsifian

The permission error patch might be a potential cause, I will make a new release a priority so people don't need to rely on the custom patches.

There is also a chance something else is going on that is specific to openbsd. I will try to introduce more random directory generation and fuzzing to try and reproduce it on my end.

andrewchambers avatar Mar 23 '23 01:03 andrewchambers

Are you able to recover the file using a full 'get' of the snapshot?

I stand even more corrected. I tried extracting with GNU tar instead of OpenBSD's tar, and it seems to work fine*. So, the only bugs I can confirm for now are: (a) --pick giving the incorrect, stitched-together output, and (b) incompatibility of bupstash get output with OpenBSD's tar.

Some more info, in case it helps:

  • I can reproduce (a) in the sense that re-running the pick command gives the same problem, and I just tried picking another file and it was also a piece of some other file but with the right length. But haven't yet reproduced in the sense of making another corrupted backup.
  • When I created the original backup, I had some false starts: I tried running bupstash put, but it failed with permission errors, after making some progress. Then I created two successful backups, and it was the second one that I was looking at. I also ran bupstash gc at some point.

*(Details: GNU tar extracts without any errors (except it failed to create some files in directories with mode 555), and I don't see missing files. I did not look too deeply this time; I just looked at a part of the file tree that had missing files when I used BSD tar.)

falsifian avatar Mar 23 '23 01:03 falsifian

I am the original reported of the problem (on the public chat).

I just want to add that I am still running bupstash on OpenBSD, on several hosts, several times per day (it is a total of about 27 run per day). It is done with bupstash put followed by checking the backup (bupstash extraction + tarball check) with bupstash get | gtar ft.

Currently, the problem didn't occur again, so it seems to popup at very low rate.

semarie avatar Mar 23 '23 07:03 semarie

Interesting that this was openbsd as well, I can maybe do my fuzzing in a vm there.

andrewchambers avatar Mar 23 '23 07:03 andrewchambers

I've verified that this is reproducible with my own private data, but haven't yet managed to see the error with synthetic data.

I tried making a synthetic directory structure with thousands of files and some of them not readable, but wasn't able to trigger the bug; if anyone has ideas about what properties of the synthetic files might be important (e.g. large/small size?) I can keep trying.

Summary of what follows:

  • It does seem to be related to permission errors; maybe the cherry-picking done by the OpenBSD port is responsible.
  • It triggered a new bad behaviour (crashing).

To be precise, I've included a script below, intended only to be run on my computer. The exclude_args are just there to speed things up.

The three bupstash get commands at the end of the script behave badly in three different ways:

  • The first extracts the wrong content with the right length, as reported above.
  • The second says bupstash get: entry 0 content hash differs from index hash, possible corruption detected. as reported above.
  • The third crashes with the following output. If it would help I can try to figure out how to build debug symbols or whatever's needed to fill in those <unknown>s.
thread 'main' panicked at 'slice index starts at 282688 but ends at 453', src/client.rs:683:34
stack backtrace:
   0:      0x1a1fdcef10d - <unknown>
   1:      0x1a1fdc32aba - <unknown>
   2:      0x1a1fdcda554 - <unknown>
   3:      0x1a1fdceeeea - <unknown>
   4:      0x1a1fdcea4d5 - <unknown>
   5:      0x1a1fdceafb1 - <unknown>
   6:      0x1a1fdcef4d7 - <unknown>
   7:      0x1a1fdcef430 - <unknown>
   8:      0x1a1fdcea904 - <unknown>
   9:      0x1a1fdc34f33 - <unknown>
  10:      0x1a1fdc31186 - <unknown>
  11:      0x1a1fdc31119 - <unknown>
  12:      0x1a1fdb6309d - <unknown>
  13:      0x1a1fdb5d258 - <unknown>
  14:      0x1a1fdbf6d00 - <unknown>
  15:      0x1a1fdb0c046 - <unknown>
  16:      0x1a1fdc0321b - <unknown>
  17:      0x1a1fdacf342 - <unknown>
Abort trap (core dumped) 

Some notes:

  • Re-running the script as-is produces the exact same bupstash get content for the ssh_keys file. But if I change things a bit (I think I was just editing exclude_args) the output changes --- still exactly the same length as ~/w/misc-pri/n/ssh_keys but different (still wrong) content.
  • I tried some other files under ~/w/misc-pri/n and got similar behaviour as ssh_keys (wrong content). I tried other dotfiles in my home dir and got similar behaviour (crashing) as with ~/.shrc. So I wonder if the different behaviours end up associated with particular directories, or in the order the files were visited in.
  • If I try dumping /home/falsifian instead of / the error goes away; this is what makes me think it's related to permission errors.
  • I tried dumping a second time, with or without a send log to connect the two runs; the IDs from both runs give the wrong content when picking ssh_keys.
#!/bin/sh
set -e

TMP=$(mktemp -d "$HOME/bs_tmp/bs-XXXXXXXX")
printf 'Using temp dir: %s\n' "$TMP"
cd "$TMP"

bupstash init -r repo
bupstash new-key -o key

exclude_args="--exclude /dev --exclude /mnt --exclude /nodump --exclude /tmp --exclude /usr --exclude /vm --exclude $HOME/.cabal --exclude $HOME/.cache --exclude $HOME/.local --exclude $HOME/.mozilla --exclude $HOME/bs_tmp --exclude $HOME/co --exclude $HOME/src --exclude $HOME/var --exclude _darcs"

id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors $exclude_args /)
printf 'Id: %s\n' "$id"

bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n/ssh_keys "id=$id" > result
cmp result ~/w/misc-pri/n/ssh_keys || echo wrong content

bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n "id=$id" > n.tar || echo extract dir failed

RUST_BACKTRACE=full bupstash get -k key -r repo --pick home/falsifian/.shrc "id=$id"

falsifian avatar Apr 24 '23 18:04 falsifian

@bket FYI (assuming you're the same bket as the OpenBSD port maintainer; sorry if I guesed wrong)

falsifian avatar Apr 24 '23 18:04 falsifian

Really appreciate the investigation.

andrewchambers avatar Apr 24 '23 18:04 andrewchambers

@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of sysctl -n kern.version.

are you able to rebuild the sysutils/bupstash port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)

are you able to build bupstash HEAD from git repository and test with it ?

please note that the first error reported was without permission errors cherry-picked commit.

Cc @klemensn

semarie avatar Apr 25 '23 07:04 semarie

@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of sysctl -n kern.version.

moth ~ $ sysctl -n kern.version
OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 17:40:19 MDT 2023
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

(Just ran sysupgrade last night; the above script still produces the same errors.)

are you able to rebuild the sysutils/bupstash port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)

are you able to build bupstash HEAD from git repository and test with it ?

Sure, thanks, I can build ports and hopefully building directly isn't too hard. Will try to find some time to test at HEAD. I might first try to narrow down the part of my filesystem needed to reproduce this though; would be nice to have an example I can share.

please note that the first error reported was without permission errors cherry-picked commit.

Cc @klemensn

falsifian avatar Apr 25 '23 14:04 falsifian

Here's a complete reproducible example, showing a slightly different buggy behaviour. Hopefully it has the same cause as everything else here...

The below script ought to print z content because that's what was written to the file fs/var/z, but it prints acme-client.conf content instead.

(I call this behaviour "different" because previously the file always had exactly the right length if bupstash didn't report an error, but fs/var/z and fs/etc/examples/acme-client.conf have different lengths.)

#!/bin/sh
set -e

mkdir bug;cd bug

mkdir -p fs/etc/examples fs/var
echo bgpd.conf content > fs/etc/examples/bgpd.conf
echo acme-client.conf content > fs/etc/examples/acme-client.conf
echo z content > fs/var/z
chmod 000 fs/etc/examples/bgpd.conf

bupstash init -r repo
bupstash new-key -o key

id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs)
printf 'Id: %s\n' "$id"

bupstash get -k key -r repo --pick var/z "id=$id"

falsifian avatar Apr 25 '23 14:04 falsifian

thanks. I am able to reproduce.

with your reproducer, I have the following output (I added set -x):

$ sh ./test.sh                                                                              
+ mkdir bug
+ cd bug
+ mkdir -p fs/etc/examples fs/var
+ echo bgpd.conf content
+ > fs/etc/examples/bgpd.conf 
+ echo acme-client.conf content
+ > fs/etc/examples/acme-client.conf 
+ echo z content
+ > fs/var/z 
+ chmod 000 fs/etc/examples/bgpd.conf
+ bupstash init -r repo
+ bupstash new-key -o key
+ bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs
+ id=91333d575284d44c4c616e85d1bd98b4
+ printf Id: %s\n 91333d575284d44c4c616e85d1bd98b4
Id: 91333d575284d44c4c616e85d1bd98b4
+ bupstash get -k key -r repo --pick var/z id=91333d575284d44c4c616e85d1bd98b4
acme-client.conf content

the expected output of bupstash get --pick var/z is z content.

semarie avatar Apr 25 '23 15:04 semarie

I bisected bupstash get --pick problem to 5be2c885b5c4d6904291a7d2d6e406421b90d38f (as first bad commit).

it seems related to --pick usage: when using bupstash get (without --pick) and extracting the tarball, you get the right file.


I am think there is several problems here. bupstash get --pick is only one of them.

semarie avatar Apr 25 '23 17:04 semarie

@semarie nice work with the bisection, very useful information.

andrewchambers avatar Apr 26 '23 01:04 andrewchambers

I have tracked down the issue, when a file could not be opened we removed it from the list of file entries which broke the internal pick offset tracking in a subtle way.

The simplest fix for now is if stat works on a file, but it can't be opened will just be backed up as an empty file.

andrewchambers avatar Apr 26 '23 11:04 andrewchambers

This should be fixed before going to debian

bastien-roucaries avatar Jun 08 '23 14:06 bastien-roucaries

This should be fixed before going to debian

@bastien-roucaries FYI #384 seems to fix it. I don't know if @andrewchambers wanted to do any further review or testing before merging it.

falsifian avatar Jun 08 '23 16:06 falsifian

Yeah, the main blocker is just I had been very busy for the past 3-4 months at a new job.

andrewchambers avatar Jun 09 '23 01:06 andrewchambers

The fix for this has been merged but it not in the latest release.

andrewchambers avatar Jun 09 '23 01:06 andrewchambers