borg icon indicating copy to clipboard operation
borg copied to clipboard

Consider deprecating `init -e none` now and removing it with Borg 2

Open PhrozenByte opened this issue 5 months ago • 9 comments

/kind discussion

This issue exists to get some feedback whether deprecating init --encryption none now and removing it with Borg 2 is feasible.

Prior discussion see https://github.com/borgbackup/borg/issues/9072#issuecomment-3446729458

Since version 1.1 Borg supports the authenticated mode as an alternative to none mode: Contents are stored unencrypted, but authenticated using the HMAC-SHA256 hash. This protects users from accidental or malicious attempts to tamper with the repo, including denial-of-service attacks against clients. That's why authenticated mode is recommended over none mode since Borg 1.1 and none mode is discouraged for new repos, but still fully supported.

Since Borg 2 requires users to transfer their Borg 1 archives over to a new Borg 2 repo anyway, I think this is the best time to remove none mode if we actually consider it obsolete. AFAIK transfer requires rechunking then. The idea is to officially deprecate none mode now and remove it with Borg 2 (with the exception of transfer --from-borg1), users really should use something else (either authenticated, or repokey / keyfile with an empty passphrase).

I'm not 100% convinced about this idea yet either, but some feedback and a discussion about it might clarify things.

Open question for our encryption experts: What are the implications of using an empty passphrase with authenticated mode?

PhrozenByte avatar Oct 26 '25 22:10 PhrozenByte

IIRC this was suggested (by me?) and discussed before, check the other issues.

ThomasWaldmann avatar Oct 27 '25 16:10 ThomasWaldmann

What are the implications of using an empty passphrase with authenticated mode?

That's relatively easy to answer:

The borg key passphrase is used to decrypt the borg key (similar as a ssh key passphrase is used), revealing the key material inside it: AES encryption key, the MAC authentication key and the chunker secret.

If an attacker could somehow "steal" the borg key, it would not be usable for them unless they also know the key passphrase.

Storage location of the borg key:

  • keyfile mode: it usually is stored in the home directory of the user
  • repokey mode: it is stored inside the repository directory

An empty (or otherwise too easy) key passphrase does not really "protect" the key material.

But, the key material inside the borg key will always be random and of high quality.

ThomasWaldmann avatar Oct 27 '25 16:10 ThomasWaldmann

IIRC this was suggested (by me?) and discussed before, check the other issues.

I searched before, didn't find anything, searched again now and found it (#7600). I didn't search for "unencrypted", but "encryption none", that's why I didn't find it… 😒 Thanks for the heads up :+1:

Anyway, I feel like that this should be discussed again. There may have been additional feedback in other media, but at least the feedback in #7600 focuses entirely on the overhead of encryption and/or the fear of losing the key and thus access to the backup. These concerns, however, do not apply to authenticated mode, because data isn't encrypted (to be fair, BORG_WORKAROUNDS=authenticated_no_key was added after #7600, so loosing the key was a valid concern back then, but is no longer).

Some questions to clarify a few things (not just for this matter, but also for inclusion in #9103):

  • How does authenticated compare to none when it comes to accidental tampering (either by mistake, or possible hardware issues) with a repo? Is there a difference when it comes to detecting invalid data?
  • How does authenticated compare to repokey when it comes to hashing performance? Is the performance impact as great as with repokey, or is it lower (i.e. does repokey need to do more hashing than authenticated, or is it the same)?
  • How does authenticated-blake2 compare to none when it comes to performance (especially hashing performance, but not limited to that, i.e. overall performance)? The current docs explicitly state "which makes authenticated-blake2 faster than none". Is this true for all hardware, e.g. a Raspberry Pi 4 with no relevant hardware acceleration whatsoever as mentioned in #7600 (comment)?
  • Assuming one looses the passphrase of an authenticated repo, does transfer work with BORG_WORKAROUNDS=authenticated_no_key?

I'm specifically thinking about whether Borg could recommend using authenticated with an empty passphrase in "I don't need/want Borg to do any security" scenarios.

What are the implications of using an empty passphrase with authenticated mode?

That's relatively easy to answer […]

Thanks :+1:

PhrozenByte avatar Oct 27 '25 19:10 PhrozenByte

How does authenticated compare to none when it comes to accidental tampering (either by mistake, or possible hardware issues) with a repo? Is there a difference when it comes to detecting invalid data?

If you use "authenticated" with a well-protected borg key (good passphrase), it offers similar (high) strength authentication as the encrypted modes. No tampering would stay undiscovered.

That is not only good against accidental corruption, but also against malicious tampering.

If you use an empty or bad passphrase and the attacker gets the borg key AND your passphrase, the authentication is worthless, because the attacker can decrypt the borg key and use the secret key material inside to authenticate data in the same way as the legitimate user.

How does authenticated compare to repokey when it comes to hashing performance? Is the performance impact as great as with repokey, or is it lower (i.e. does repokey need to do more hashing than authenticated, or is it the same)?

Both "authenticated" and "repokey" use hmac-sha256, thus the hashing performane is the same. Of course, "repokey" will additionally do encryption and "authenticated" won't, thus "authenticated" will be a bit faster overall.

On modern CPUs, AES and sha256 might both be hw accelerated and only have a minor performance impact.

How does authenticated-blake2 compare to none when it comes to performance (especially hashing performance, but not limited to that, i.e. overall performance)? The current docs explicitly state "which makes authenticated-blake2 faster than none". Is this true for all hardware, e.g. a Raspberry Pi 4 with no relevant hardware acceleration whatsoever as mentioned in https://github.com/borgbackup/borg/issues/7600#issuecomment-1560894690?

On CPUs with HW acceleration for sha256, hmac-sha256 usually will be faster than blake2. Without, it will be vice versa.

"none" also needs to compute a sha256 hash (for dedup), it just does not use a secret key as "authenticated" does with hmac-sha256. See above about whether sha256 or blake2 is expected to be faster.

Assuming one looses the passphrase of an authenticated repo, does transfer work with BORG_WORKAROUNDS=authenticated_no_key?

I think so, yes, but I did not experiment much with that.

ThomasWaldmann avatar Oct 27 '25 20:10 ThomasWaldmann

Additional to the above borg 1.x info, borg2 will use faster AEAD crypto algorithms (e.g. AES-OCB or Chacha20-Poly1305) that compose encryption and authentication internally, which can be faster than the 2 crypto API calls borg 1.x used (encrypt, authenticate).

ThomasWaldmann avatar Oct 27 '25 20:10 ThomasWaldmann

Thanks again :+1:

I did some testing with benchmark crud on an 8th gen Intel notebook (Intel Core i5-8265U, with AES but without SHA256 hardware acceleration, launched in 10/2018) with empty repos on a ramdisk - and by looking at the differences I feel like that authenticated is a great alternative. Here's the full data: borg_bench.ods

  • Initial create with authenticated-blake2 is ~7.5% faster (average of C-*-*) than with none, but subsequent creates are ~11% slower (average of U-*-*). extract is an astonishing ~28% faster (average of E-*-*).
  • create with repokey-blake2 is ~11% slower than with none - for both the initial and update runs (averages of C-*-* resp. U-*-*). extract is a whopping ~31% faster (average of E-*-*).
  • delete is ~51% slower than with none (no matter the other mode; average of D-*-*), but only because I'm testing on a ramdisk, else this would be limited by I/O and thus not showing any difference

Results differ by hardware for sure, but assuming same hardware acceleration status they should paint a similar picture? I'm asking for both this matter (deprecating and later removing none mode), and maybe including the results (no exact numbers, but magnitudes) in the init -e docs (i.e. follow-up to #9103).

My doubts about the decision to keep none primarily comes from the feedback in #7600 clearly focusing (basically entirely) on the overhead of encryption and/or the fear of losing the key and thus access to the backup. People might have been aware of authenticated (resp. authenticated-blake2) and its implications, but the feedback unfortunately doesn't show that and no other reasoning was given (at least on GitHub). A contributing factor might have been that it's not very easy to benchmark different modes. To do so one must create repos with different modes manually and then compare them manually - the latter being especially hard due to the human readable text output (nothing a sed call can't solve, but still).

In the end this is a question of future support, not todays. Borg 2 won't be released tomorrow. Support of Borg 1.4 (and thus none) won't end the day after the release of Borg 2 (I guess?). With every year passing more and more hardware has both AES and SHA256 hardware acceleration (in #7600 a RasPi 4 was mentioned - later that same year the RasPi 5 was released (meaning it's already two years old) which comes with both AES and SHA256 hardware acceleration). Honestly, I was kinda surprised when I did the research for #9103 to include the generations when Intel/AMD/ARM added support… So, this gets less a problem every year - and users with such old hardware that neither authenticated nor authenticated-blake2 are an option could simply keep using Borg 1.4.

What are your thoughts on this? I feel like that by better explaining authenticated (I'm still thinking about more improvements for the init -e docs in this regard) and maybe simplifying benchmark (I could create a separate issue with a few suggestions, but since it provides no direct benefit for users it's rather low priority I guess) to allow users to test for themselves could swing opinions. The question is whether you even consider all this hassle worth the benefits code-wise (you said in #7600 that removing none would allow for simplifying Borg's code, how much is something I simply can't assess). WDYT? Is this worth proceeding (and maybe even taking another survey when you consider Borg 2 to be close enough to a final release) or shall we rather close this and stick with the previous decision to keep none (probably) permanently?

PhrozenByte avatar Nov 03 '25 21:11 PhrozenByte

Yes, i think it is reasonable to reconsider this.

Did you see there is now also borg benchmark cpu? It's a more synthetic benchmark only for the misc. algorithms, but it shows how fast most of them are (only exception: high compression modes).

How much code could be simplified/removed is something I have to investigate myself. It's likely not much, but rather "special" or "weird" cases, docs, ...

ThomasWaldmann avatar Nov 04 '25 22:11 ThomasWaldmann

Did you see there is now also borg benchmark cpu? It's a more synthetic benchmark only for the misc. algorithms, but it shows how fast most of them are (only exception: high compression modes).

I knew that benchmark cpu was added with Borg 2, but didn't use it because I was testing with Borg 1.4. I did so because the idea originated from #9103 addressing Borg 1.4, but for our question here we must indeed test with Borg 2.0 instead. So, I just did that, also including the new algorithms, and the results paint an even more consistent picture. Here are my full results: borg_bench.ods

  • Initial create with authenticated-blake2 is "just" ~4% faster (average of C-*-*) than with none, but subsequent creates are now ~6% faster (average of U-*-*), too. extract is still ~22% faster (average of E-*-*).
  • create with repokey-blake2-aes-ocb shows the same speed for the initial run (average of C-*-*), but is ~6% faster for later runs (average of U-*-*). extract is just ~2.5% faster though (average of E-*-*).
  • Initial create with repokey-blake2-chacha20-poly1305 is ~1.5% slower (average of C-*-*) than with none, but subsequent creates are ~7% faster (average of U-*-*), too. extract is ~4% slower (average of E-*-*).
  • repokey-chacha20-poly1305 is very similar to repokey-aes-ocb performance-wise, and so is repokey-blake2-chacha20-poly1305 to repokey-blake2-aes-ocb; so, performance shouldn't be the deciding factor here, but security.
  • delete is ~62% slower than with none (no matter the other mode; average of D-*-*), but still, should be limited by I/O in practice and thus should show next to no difference

So, at least from my tests, authenticated-blake2 is basically always faster than none, with the only exception being create with all-zero files. I'd say all-zero files are only really a thing for big files (like disk images), aren't they? Because big all-zero files are the exception to the exception with basically the same performance as with none

repokey-blake2-aes-ocb is faster than none on average, too, with the same exception as above, but additionally a weakness with small random files - assuming AES hardware acceleration is available. The fear of loosing the key (even with repokey and an empty passphrase due to possible corruption) is still a valid concern, though.

An open question is whether my test hardware is the rule, or the exception here. I'll open another issue in the next couple of days with a few suggestion on how to simplify the usage of benchmark crud to compare different encryption algorithms.

The results of borg benchmark cpu are very interesting from a technical standpoint (e.g., blake2 is ~32% faster than sha256 on my test hardware), but don't really help me with choosing the right encryption method to be honest. Here are my results: borg_bench_cpu.txt

PhrozenByte avatar Nov 05 '25 19:11 PhrozenByte

I just opened a bunch of suggestions related to benchmark, see #9165, #9166, and #9167. I intentionally created separate issues instead of a single big one, because I felt like that they don't necessarily require each other and can be decided upon and implemented separately.

PhrozenByte avatar Nov 10 '25 14:11 PhrozenByte