borg icon indicating copy to clipboard operation
borg copied to clipboard

borg beta: files cache is corrupted after connection broke

Open dietmargoldbeck opened this issue 10 months ago • 2 comments

With borg 2.0.0b14 i have the problem that the file cache is corrupt when the ssh connection breaks (e.g. DSL reconnect) during the backup. This causes the next run to take very very long.

Last nights failure: borg.remote.ConnectionBrokenWithHint: Connection to remote host is broken. Broken Pipe

And on the next start:

borgstore.backends.errors.ObjectNotFound: cache/chunks

cached chunk indexes: ['c7391654515103da', 'd6891965cecf766a']
trying to load cache/chunks.c7391654515103da from the repo...
cache/chunks.c7391654515103da is valid.
cached chunk index c7391654515103da gets merged...
trying to load cache/chunks.d6891965cecf766a from the repo...
cache/chunks.d6891965cecf766a is valid.
cached chunk index d6891965cecf766a gets merged...
caching 2417203 new chunks.
cached chunk indexes: ['c7391654515103da', 'd6891965cecf766a']
caching chunks index as cache/chunks.846d83a6bd1df79e in repository...
cached chunk indexes deleted: {'c7391654515103da', 'd6891965cecf766a'}
compress_entry failed for FileCacheEntry(age=1, inode=1086976, size=11181, ctime=Timestamp(seconds=1738562306, nanoseconds=715832592), mtime=Timestamp(seconds=1738562306, nanoseconds=715832592), chunks=[[b'\x0c\xc9\x04\xc1\x9d,\xa0W\xbcW\xa2\xc3E\xb1U\xce1\xe5\x84\xc2O\xc0\xb3gm\xa1\xa5C\t@Z6', 11181]]), skipping.
compress_entry failed for FileCacheEntry(age=1, inode=1086566, size=626, ctime=Timestamp(seconds=1738615904, nanoseconds=344829180), mtime=Timestamp(seconds=1738615904, nanoseconds=336829227), chunks=[[b'\x13\x9ab\xdf\xd1\xf7\xc2L\x91\x8cQ\xa9}\xdc\xac\x87\xa4e\x88\xd6}C\xe5M\x05\xb5\x17\x18\xce\nya', 626]]), skipping.
compress_entry failed for FileCacheEntry(age=1, inode=1086782, size=11157, ctime=Timestamp(seconds=1738615904, nanoseconds=716826991), mtime=Timestamp(seconds=1738615904, nanoseconds=716826991), chunks=[[b'\xe8\xd6\xcb\xa8\x85f~\x86\xb4q\xc0\x19\x9c\xbe\x08.U\xf7\xe1\x9c\xac\x0c\n\x10\xddO\xddO39\xb7Q', 11157]]), skipping.
compress_entry failed for FileCacheEntry(age=1, inode=1086876, size=253051, ctime=Timestamp(seconds=1738601171, nanoseconds=679926175), mtime=Timestamp(seconds=1738601171, nanoseconds=679926175), chunks=[[b'zw\xc1\xd3v\xd0\xc0C(\xb1\xa4\xc3D\xe75\xf1\x94j-\xee\xedmc\x911\x08\xa8\xab\xbaE\xd1\x01', 253051]]), skipping.
compress_entry failed for FileCacheEntry(age=1, inode=1087002, size=10080, ctime=Timestamp(seconds=1738616781, nanoseconds=611668728), mtime=Timestamp(seconds=1738616781, nanoseconds=611668728), chunks=[[b'\xcc\xbd\xe2\xc6\xa6\x8c\xe0\tF\xf3\xcf\xb8\x88\xe9\xe0\x07\xe7e\xb9\x00\x0c\n\x97\xa8\xf78\x98\x94*}\xf1\xab', 10080]]), skipping.
The files cache is corrupted. [File failed integrity check: /root/.cache/borg/961ae9947427e29ecc0646b4713863120ba10745a63f390e686331afeb20818a/files.root]

I also had the ssh connection break without problems with the cache and i did not check if it is easy to reproduce with a small setup. borg is backing up a few TB currently.

dietmargoldbeck avatar Feb 04 '25 06:02 dietmargoldbeck

compress_entry failures: guess that means that a chunkid listed in the entry does not exist in the chunks index. Skipping that entry is a good way to deal with that.

"The files cache is corrupted" - hmm, that should not be. Could be either a borg bug or an issue on that system (the files.root file really being corrupted).

ThomasWaldmann avatar Feb 04 '25 10:02 ThomasWaldmann

@dietmargoldbeck could you apply a patch to the source code and reproduce the issue?

--- a/src/borg/crypto/file_integrity.py
+++ b/src/borg/crypto/file_integrity.py
@@ -192,8 +192,11 @@ def hash_part(self, partname, is_final=False):
         digest = self.hasher.hexdigest()
         if self.writing:
             self.digests[partname] = digest
-        elif self.digests and not compare_digest(self.digests.get(partname, ""), digest):
-            raise FileIntegrityError(self.path)
+        else:
+            expected_digest = self.digests.get(partname, "")
+            if not compare_digest(expected_digest, digest):
+                logger.debug(f"digest mismatch for part {partname}: expected {expected_digest}, got {digest}.")
+                raise FileIntegrityError(self.path)
 
     def __exit__(self, exc_type, exc_val, exc_tb):
         exception = exc_type is not None

Also, running the borg create (the one that gets interrupted) at debug level might be useful.

ThomasWaldmann avatar Apr 04 '25 19:04 ThomasWaldmann

guess i can't do anything here.

if it still happens with latest beta, please open a new ticket.

ThomasWaldmann avatar Sep 10 '25 17:09 ThomasWaldmann