Split `repo-create --encryption` into `--mode`, `--encryption`, and `--hash` (Borg 2 only)
/kind enhancement
Some prior, loosely related discussions can be found in #9103 and #9104.
Following the refactored init --encryption docs for Borg 1.4 (see #9103), we still have to update the docs for Borg 2. However, before doing that, I feel like that there's some room for improvement in regards to Borg 2's CLI of repo-create.
I like to suggest splitting the current --encryption option into separate --mode, --encryption, and --id-hash options (plus --unsafe-unencrypted option if #9104 is disregarded) as follows:
- With the new
-e/--encryptionoption users solely choose the encryption and authentication algorithm. It acceptsnone,aes256-ocb, andchacha20-poly1305. This option is required. - With the new
-m/--modeoption users choose where the key(s) shall be stored. It acceptsrepokey, andkeyfile. It defaults torepokeyif omitted. If--encryption noneis given,--mode keyfileis rejected as invalid. - With the new
-i/--id-hashoption users choose the ID hash. It acceptssha256, andblake2. It defaults tosha256if omitted. - If #9104 is disregarded, the
--unsafe-unencryptedoption (no shorthand) is added to replace the old--encryption noneoption.--unsafe-unencryptedthen is a highlander option for--mode,--encryption, and--id-hash. The name is chosen intentionally to further strengthen that Borg advises against using it.
Here's a translation table of the old and suggested new options in master:
Old --encryption |
New --mode |
New --encryption |
New --id-hash |
Notes |
|---|---|---|---|---|
repokey-blake2-chacha20-poly1305 |
repokey |
chacha20-poly1305 |
blake2 |
|
keyfile-blake2-chacha20-poly1305 |
keyfile |
chacha20-poly1305 |
blake2 |
|
repokey-chacha20-poly1305 |
repokey |
chacha20-poly1305 |
sha256 |
|
keyfile-chacha20-poly1305 |
keyfile |
chacha20-poly1305 |
sha256 |
|
repokey-blake2-aes-ocb |
repokey |
aes256-ocb |
blake2 |
|
keyfile-blake2-aes-ocb |
keyfile |
aes256-ocb |
blake2 |
|
repokey-aes-ocb |
repokey |
aes256-ocb |
sha256 |
|
keyfile-aes-ocb |
keyfile |
aes256-ocb |
sha256 |
|
authenticated-blake2 |
repokey |
none |
blake2 |
|
~~keyfile~~ |
~~none~~ |
~~blake2~~ |
Not supported | |
authenticated |
repokey |
none |
sha256 |
|
~~keyfile~~ |
~~none~~ |
~~sha256~~ |
Not supported | |
none |
Use --unsafe-unencrypted instead |
Reasoning:
Borg's --encryption option has grown over the years: Borg initially started with the none, repokey, and keyfile modes. Users chose the mode depending on whether they wanted encryption, and where to store the key(s). Borg 1.1 later added authenticated as an unencrypted, but authenticated alternative somewhere between none and repokey, but also added *-blake2 counterparts for all modes to use BLAKE2b-256 instead of HMAC-SHA256 for authentication. IMHO and in retrospect, the *-blake2 counterparts probably should have been a separate option instead.
Borg 2 now switches to AES-OCB in favour of AES-CTR and, more critically, adds support for CHACHA20-POLY1305, increasing the number of alternatives from 7 to 11. The docs now have to use a K placeholder to shorten the repokey and keyfile alternatives, because there are simply too many alternatives otherwise.
So, I believe it's better to split the old --encryption option into distinct decisions: The encryption algorithm to use (new --encryption), where to store the key(s) (new --mode), and what ID hash to use (new --id-hash).
Open questions / further considerations:
-
I was initially thinking about splitting the new
--encryptionoption even further into separate--encryptionand--authenticationoptions.--encryptionwould then solely choose the encryption algorithm, and--authenticationthe authentication algorithm.--mode repokey --id-hash blake2 --encryption aes256 --authentication blake2bwould then match--encryption repokey-blake2with Borg 1.4 (I think?). However, this would allow for non-standardised and never formally analysed combinations like AES256 with POLY1305, or CHACHA20 with BLAKE2b. That's probably a bad idea security-wise… And probably provides no benefit anyway.The reason why I'm bringing this up is because I wanted to ask whether there's anything else that might be added in the future that would be more than just adding another value to the suggested
--encryption,--mode, and--id-hashoptions, or that would require changing what these options mean. This includes distant, i.e. not yet decided, but thinkable ideas. I'm thinking about post-quantum crypto or stuff like that (again, I'm no encryption expert, so my question might indeed be a bit silly, I just don't know). WDYT? -
Why does
{repokey,keyfile}-chacha20-poly1305use HMAC-SHA256 as ID-Hash and not a simple SHA256? -
The
--modeoption doesn't have to be limited torepokeyandkeyfilein the future. I'm thinking about e.g. storing the keys on security devices (loosely related: #8995), or using 3rd-party APIs to store/read keys (e.g. password managers). -
Instead of making
--id-hashoptional, we could also force users to make an informed decision (i.e. run some benchmarks first) by requiring not only--encryption, but also--id-hash.
Changes to suggestion:
- 2025-11-21:
--hashrenamed to--id-hashto minimise confusion with the auth part of--encryption
I want to second this proposal to split the -e option: I think it significantly improves clarity.
As newbie user, I stumbled over the table at the docs of borg repo-create and it derailed even me as a techie-security-nerd: I do know about weird terms like BLAKE2b, HMAC-SHA256 and alike. And still I tried to run a -e K-blake2-chacha20-poly1305. I'm even more convinced, that the current CLI will overly confusing any "regular" end-user.
Looking at "Choosing an encryption mode" the whole paragraph mostly talks about repokey vs. keyfile mode: And that's perfectly fine, as this is the single, most significant user-facing choice already very well described.
For 95+% of the users, sensible defaults for all other options would IMHO be just the best option.
There is currently only 1 byte that encodes all this information in the repository. And that byte selects the "cipher suite" with the corresponding number.
There is currently only 1 byte that encodes all this information in the repository. And that byte selects the "cipher suite" with the corresponding number.
I'm aware of that. Just to make this clear, I'm not suggesting to change anything in the backend, this is solely about the UI.
I had a lookup table for crypto.key.key_creator() in mind, looking for ( args.mode, args.encryption, args.id_hash ) in a dict, or raising an error otherwise.
{
("repokey", "chacha20-poly1305", "blake2"): Blake2CHPORepoKey,
("keyfile", "chacha20-poly1305", "blake2"): Blake2CHPOKeyfileKey,
("repokey", "chacha20-poly1305", "sha256"): CHPORepoKey,
("keyfile", "chacha20-poly1305", "sha256"): CHPOKeyfileKey,
("repokey", "aes256-ocb", "blake2"): Blake2AESOCBRepoKey,
("keyfile", "aes256-ocb", "blake2"): Blake2AESOCBKeyfileKey,
("repokey", "aes256-ocb", "sha256"): AESOCBRepoKey,
("keyfile", "aes256-ocb", "sha256"): AESOCBKeyfileKey,
("repokey", "none", "blake2"): Blake2AuthenticatedKey,
("repokey", "none", "sha256"): AuthenticatedKey,
}
The dict is just for illustration, in practice I don't think that the combination of string representations should be hard coded like that, but rather moved to the AuthenticatedKey, Blake2AuthenticatedKey, AESOCBKeyfileKey, AESOCBRepoKey, … classes and generated from the existing crypto.key.AVAILABLE_KEY_TYPES tuple at runtime instead (like crypto.key.KeyBase.ARG_NAME, just with three separate options now). The only hard coded exception would be --unsafe-unencrypted, if not removed anyway.
However, that's just a suggestion, the implementation is of course totally up to the devs.
By the way, a note for anyone who might want to pick this up: I'm planning to do a rewrite of the repo-create docs to "backport" #9103 to master anyway, so any PR implementing this, if considered feasible, can leave out the docs changes, I'll do them in parallel then.
Why does {repokey,keyfile}-chacha20-poly1305 use HMAC-SHA256 as ID-Hash and not a simple SHA256?
To avoid fingerprinting of chunk content via the chunks IDs. Such IDs are used at the api level and might end up in not encrypted repo index. The hmac-sha256 computation involves secret key material so an attacker could not do that computation also and thus could not try a fingerprinting attack.
The fact that borg uses sha256 IDs when no borg key exists and hmac-sha256 IDs if a key is used, makes this a bit more complicated.
The --id-hash docs could say that the value only gives the hashing algorithm "used within" the id computation (which in the end then can be either sha256 or hmac-sha256, or blake2b-unkeyed or blake2b-keyed).
The --mode could be rather --key none/repokey/keyfile.
To avoid fingerprinting of chunk content via the chunks IDs.
I see. Thanks for the explanation! :+1:
The fact that borg uses sha256 IDs when no borg key exists and hmac-sha256 if a key is used, makes this a bit more complicated.
I'd strongly vote for leaving the "no borg key" variant (you mean the former --encryption none, right?) out of the new --mode/--encryption/--id-hash options, but rather move it to a completely separate and single --unsafe-unencrypted option that acts as a highlander for the three new options.
I'm aware that this requires Borg to handle it as a special case (for UI handling, still no backend changes), but since it is a special case indeed and since we want users to use authenticated instead, I feel like that handling it as a special case is justified - if support of unencrypted/unauthenticated repos isn't removed altogether anyway (see our discussion in #9104). Suggesting to rename the old --encryption authenticated to the new --encryption none --id-hash sha256 was a conscious decision of mine.
Besides, I feel like that the difference between hmac-sha256 and sha256 resp. blake2b-unkeyed and blake2b-keyed are important from a technical perspective, but not so much from an user's perspective (i.e., this could be mentioned in the docs, but I assume that ordinary users don't care much, they rather care about the performance differences of the sha256 vs. blake2 variants). Just calling it "sha256" and "blake2" is fine IMHO, but it's your decision of course.
The
--modecould be rather--key none/repokey/keyfile.
Following the above, we AFAIK don't need a --mode none value, but the option's name could indeed be changed to better tell what it is configuring ("mode" is kinda generic). How about --key-storage?