borg icon indicating copy to clipboard operation
borg copied to clipboard

Split `repo-create --encryption` into `--mode`, `--encryption`, and `--hash` (Borg 2 only)

Open PhrozenByte opened this issue 1 month ago • 7 comments

/kind enhancement

Some prior, loosely related discussions can be found in #9103 and #9104.

Following the refactored init --encryption docs for Borg 1.4 (see #9103), we still have to update the docs for Borg 2. However, before doing that, I feel like that there's some room for improvement in regards to Borg 2's CLI of repo-create.

I like to suggest splitting the current --encryption option into separate --mode, --encryption, and --id-hash options (plus --unsafe-unencrypted option if #9104 is disregarded) as follows:

  • With the new -e / --encryption option users solely choose the encryption and authentication algorithm. It accepts none, aes256-ocb, and chacha20-poly1305. This option is required.
  • With the new -m / --mode option users choose where the key(s) shall be stored. It accepts repokey, and keyfile. It defaults to repokey if omitted. If --encryption none is given, --mode keyfile is rejected as invalid.
  • With the new -i / --id-hash option users choose the ID hash. It accepts sha256, and blake2. It defaults to sha256 if omitted.
  • If #9104 is disregarded, the --unsafe-unencrypted option (no shorthand) is added to replace the old --encryption none option. --unsafe-unencrypted then is a highlander option for --mode, --encryption, and --id-hash. The name is chosen intentionally to further strengthen that Borg advises against using it.

Here's a translation table of the old and suggested new options in master:

Old --encryption New --mode New --encryption New --id-hash Notes
repokey-blake2-chacha20-poly1305 repokey chacha20-poly1305 blake2
keyfile-blake2-chacha20-poly1305 keyfile chacha20-poly1305 blake2
repokey-chacha20-poly1305 repokey chacha20-poly1305 sha256
keyfile-chacha20-poly1305 keyfile chacha20-poly1305 sha256
repokey-blake2-aes-ocb repokey aes256-ocb blake2
keyfile-blake2-aes-ocb keyfile aes256-ocb blake2
repokey-aes-ocb repokey aes256-ocb sha256
keyfile-aes-ocb keyfile aes256-ocb sha256
authenticated-blake2 repokey none blake2
~~keyfile~~ ~~none~~ ~~blake2~~ Not supported
authenticated repokey none sha256
~~keyfile~~ ~~none~~ ~~sha256~~ Not supported
none Use --unsafe-unencrypted instead

Reasoning:

Borg's --encryption option has grown over the years: Borg initially started with the none, repokey, and keyfile modes. Users chose the mode depending on whether they wanted encryption, and where to store the key(s). Borg 1.1 later added authenticated as an unencrypted, but authenticated alternative somewhere between none and repokey, but also added *-blake2 counterparts for all modes to use BLAKE2b-256 instead of HMAC-SHA256 for authentication. IMHO and in retrospect, the *-blake2 counterparts probably should have been a separate option instead.

Borg 2 now switches to AES-OCB in favour of AES-CTR and, more critically, adds support for CHACHA20-POLY1305, increasing the number of alternatives from 7 to 11. The docs now have to use a K placeholder to shorten the repokey and keyfile alternatives, because there are simply too many alternatives otherwise.

So, I believe it's better to split the old --encryption option into distinct decisions: The encryption algorithm to use (new --encryption), where to store the key(s) (new --mode), and what ID hash to use (new --id-hash).

Open questions / further considerations:

  • I was initially thinking about splitting the new --encryption option even further into separate --encryption and --authentication options. --encryption would then solely choose the encryption algorithm, and --authentication the authentication algorithm. --mode repokey --id-hash blake2 --encryption aes256 --authentication blake2b would then match --encryption repokey-blake2 with Borg 1.4 (I think?). However, this would allow for non-standardised and never formally analysed combinations like AES256 with POLY1305, or CHACHA20 with BLAKE2b. That's probably a bad idea security-wise… And probably provides no benefit anyway.

    The reason why I'm bringing this up is because I wanted to ask whether there's anything else that might be added in the future that would be more than just adding another value to the suggested --encryption, --mode, and --id-hash options, or that would require changing what these options mean. This includes distant, i.e. not yet decided, but thinkable ideas. I'm thinking about post-quantum crypto or stuff like that (again, I'm no encryption expert, so my question might indeed be a bit silly, I just don't know). WDYT?

  • Why does {repokey,keyfile}-chacha20-poly1305 use HMAC-SHA256 as ID-Hash and not a simple SHA256?

  • The --mode option doesn't have to be limited to repokey and keyfile in the future. I'm thinking about e.g. storing the keys on security devices (loosely related: #8995), or using 3rd-party APIs to store/read keys (e.g. password managers).

  • Instead of making --id-hash optional, we could also force users to make an informed decision (i.e. run some benchmarks first) by requiring not only --encryption, but also --id-hash.

Changes to suggestion:

  • 2025-11-21: --hash renamed to --id-hash to minimise confusion with the auth part of --encryption

PhrozenByte avatar Nov 10 '25 17:11 PhrozenByte

I want to second this proposal to split the -e option: I think it significantly improves clarity.

As newbie user, I stumbled over the table at the docs of borg repo-create and it derailed even me as a techie-security-nerd: I do know about weird terms like BLAKE2b, HMAC-SHA256 and alike. And still I tried to run a -e K-blake2-chacha20-poly1305. I'm even more convinced, that the current CLI will overly confusing any "regular" end-user.

Looking at "Choosing an encryption mode" the whole paragraph mostly talks about repokey vs. keyfile mode: And that's perfectly fine, as this is the single, most significant user-facing choice already very well described.

For 95+% of the users, sensible defaults for all other options would IMHO be just the best option.

bentolor avatar Dec 02 '25 12:12 bentolor

There is currently only 1 byte that encodes all this information in the repository. And that byte selects the "cipher suite" with the corresponding number.

ThomasWaldmann avatar Dec 02 '25 14:12 ThomasWaldmann

There is currently only 1 byte that encodes all this information in the repository. And that byte selects the "cipher suite" with the corresponding number.

I'm aware of that. Just to make this clear, I'm not suggesting to change anything in the backend, this is solely about the UI.

I had a lookup table for crypto.key.key_creator() in mind, looking for ( args.mode, args.encryption, args.id_hash ) in a dict, or raising an error otherwise.

{
    ("repokey", "chacha20-poly1305", "blake2"): Blake2CHPORepoKey,
    ("keyfile", "chacha20-poly1305", "blake2"): Blake2CHPOKeyfileKey,
    ("repokey", "chacha20-poly1305", "sha256"): CHPORepoKey,
    ("keyfile", "chacha20-poly1305", "sha256"): CHPOKeyfileKey,

    ("repokey", "aes256-ocb", "blake2"): Blake2AESOCBRepoKey,
    ("keyfile", "aes256-ocb", "blake2"): Blake2AESOCBKeyfileKey,
    ("repokey", "aes256-ocb", "sha256"): AESOCBRepoKey,
    ("keyfile", "aes256-ocb", "sha256"): AESOCBKeyfileKey,

    ("repokey", "none", "blake2"): Blake2AuthenticatedKey,
    ("repokey", "none", "sha256"): AuthenticatedKey,
}

The dict is just for illustration, in practice I don't think that the combination of string representations should be hard coded like that, but rather moved to the AuthenticatedKey, Blake2AuthenticatedKey, AESOCBKeyfileKey, AESOCBRepoKey, … classes and generated from the existing crypto.key.AVAILABLE_KEY_TYPES tuple at runtime instead (like crypto.key.KeyBase.ARG_NAME, just with three separate options now). The only hard coded exception would be --unsafe-unencrypted, if not removed anyway.

However, that's just a suggestion, the implementation is of course totally up to the devs.

PhrozenByte avatar Dec 02 '25 16:12 PhrozenByte

By the way, a note for anyone who might want to pick this up: I'm planning to do a rewrite of the repo-create docs to "backport" #9103 to master anyway, so any PR implementing this, if considered feasible, can leave out the docs changes, I'll do them in parallel then.

PhrozenByte avatar Dec 02 '25 16:12 PhrozenByte

Why does {repokey,keyfile}-chacha20-poly1305 use HMAC-SHA256 as ID-Hash and not a simple SHA256?

To avoid fingerprinting of chunk content via the chunks IDs. Such IDs are used at the api level and might end up in not encrypted repo index. The hmac-sha256 computation involves secret key material so an attacker could not do that computation also and thus could not try a fingerprinting attack.

ThomasWaldmann avatar Dec 03 '25 10:12 ThomasWaldmann

The fact that borg uses sha256 IDs when no borg key exists and hmac-sha256 IDs if a key is used, makes this a bit more complicated.

The --id-hash docs could say that the value only gives the hashing algorithm "used within" the id computation (which in the end then can be either sha256 or hmac-sha256, or blake2b-unkeyed or blake2b-keyed).

The --mode could be rather --key none/repokey/keyfile.

ThomasWaldmann avatar Dec 03 '25 11:12 ThomasWaldmann

To avoid fingerprinting of chunk content via the chunks IDs.

I see. Thanks for the explanation! :+1:

The fact that borg uses sha256 IDs when no borg key exists and hmac-sha256 if a key is used, makes this a bit more complicated.

I'd strongly vote for leaving the "no borg key" variant (you mean the former --encryption none, right?) out of the new --mode/--encryption/--id-hash options, but rather move it to a completely separate and single --unsafe-unencrypted option that acts as a highlander for the three new options.

I'm aware that this requires Borg to handle it as a special case (for UI handling, still no backend changes), but since it is a special case indeed and since we want users to use authenticated instead, I feel like that handling it as a special case is justified - if support of unencrypted/unauthenticated repos isn't removed altogether anyway (see our discussion in #9104). Suggesting to rename the old --encryption authenticated to the new --encryption none --id-hash sha256 was a conscious decision of mine.

Besides, I feel like that the difference between hmac-sha256 and sha256 resp. blake2b-unkeyed and blake2b-keyed are important from a technical perspective, but not so much from an user's perspective (i.e., this could be mentioned in the docs, but I assume that ordinary users don't care much, they rather care about the performance differences of the sha256 vs. blake2 variants). Just calling it "sha256" and "blake2" is fine IMHO, but it's your decision of course.

The --mode could be rather --key none/repokey/keyfile.

Following the above, we AFAIK don't need a --mode none value, but the option's name could indeed be changed to better tell what it is configuring ("mode" is kinda generic). How about --key-storage?

PhrozenByte avatar Dec 03 '25 11:12 PhrozenByte