borg borg2 check redesign?

collect ideas here to optimize speed of borg check.

Nov 02 '24 23:11 ThomasWaldmann

Can "borg check --verify-data" be sped up by reading data from disk only once? Especially for local repositories or on a NAS, where borg is running on the client and all data needs to be transferred to the client?

Nov 09 '24 16:11 rschuetz

@rschuetz Yes, theoretically we could unify some parts of check that need to read all data from all repo objects.

As borg 1.x "check" is designed right now, it's difficult though:

the "repository part" makes sure all objects seem ok (by crc32 check) and builds an repository index where all these objects are present. for remote repos, this is run on the server side to avoid data transfer.
the "archives part" of the check then relies on that index and does additional checks, where decrypting archives or data chunks is required (and thus it needs the borg key). usually, the borg key is not available on the server, so this usually is run on a client.

#8517 has an idea for borg2 related to verify-data also, but this does not apply to borg 1.x.

BTW, this ticket is primarily meant for borg2, I don't think there will be major changes in borg check of 1.x.

Nov 09 '24 16:11 ThomasWaldmann

There is also #8514 and the idea to use flags for each repo object to avoid reading it multiple times.

Nov 16 '24 17:11 ThomasWaldmann