nebula Add ability to encrypt CA private key at rest

Fixes #8.

This pull request allows for the encryption of the CA private key by passing a -encrypt flag when generating the keypair. The private key is encrypted with AES-256-GCM and Argon2id. The Argon2id factors from the Argon2 RFC[1] are used by default. These parameters may be overridden via the CLI flags -argon-memory, -argon-parallelism and -argon-iterations.

[1] https://datatracker.ietf.org/doc/draft-irtf-cfrg-argon2/?include_text=1

We recommend the following procedure to select the type and the parameters for practical use of Argon2.

If you are OK with a uniformly safe option, but not tailored to your application or hardware, select Argon2id with t=1 iteration, p=4 lanes and m=2^(21) (2 GiB of RAM), 128-bit salt, 256-bit tag size. This is the FIRST RECOMMENDED option.

If much less memory is available, a uniformly safe option is Argon2id with t=3 iterations, p=4 lanes and m=2^(16) (64 MiB of RAM), 128-bit salt, 256-bit tag size. This is the SECOND RECOMMENDED option.

Calls to nebula-cert sign which point to an encrypted CA private key will now also prompt for a passphrase. Unencrypted CA private keys will not be prompted for a passphrase.

Some examples:

% ./nebula-cert ca -name "test ca"
% ./nebula-cert ca -name "test ca" -encrypt
Enter a passphrase:
% ./nebula-cert ca -name "test ca" -encrypt -argon-memory 65536 -argon-parallelism 4 -argon-iterations 3
Enter a passphrase:
% ./nebula-cert ca -name "test ca" -encrypt
Enter a passphrase:
Enter a passphrase:
Enter a passphrase:
Enter a passphrase:
Enter a passphrase:
Error: no passphrase specified, remove -encrypt flag to write out-key in plaintext
% ./nebula-cert ca -name "test ca" -encrypt
Enter a passphrase:
% ./nebula-cert sign -name "foobar" -ip "192.168.100.24/24"
Enter passphrase:
Error: error while parsing encrypted ca-key: invalid passphrase or corrupt private key
% ./nebula-cert sign -name "foobar" -ip "192.168.100.24/24"
Enter passphrase:

Feb 21 '21 22:02 johnmaguire

Just updated this - I was focused on the encryption aspect and forgot to mask the password on entry... switched to the ssh/terminal module to handle this, since SSH is already pulled in for another feature of Nebula. Added tests for the encryption/decryption marshal/unmarshal functions, but not for the cmds themselves yet.

Feb 22 '21 03:02 johnmaguire

So I was trying to figure out how best to store the parameters for Argon2id... I am familiar with the syntax PHP's password_hash uses but wasn't sure if this was a standardized format used elsewhere. However, it looks like the Argon2 reference implementation uses a similar format:

-> % argon2 testsalt
asdf
Type:		Argon2i
Iterations:	3
Memory:		4096 KiB
Parallelism:	1
Hash:		32f6f82dae234f214d4f371a4cdd2786f976b0bece64227cf171492bc4376180
Encoded:	$argon2i$v=19$m=4096,t=3,p=1$dGVzdHNhbHQ$Mvb4La4jTyFNTzcaTN0nhvl2sL7OZCJ88XFJK8Q3YYA
0.011 seconds
Verification ok

Furthermore, there are other libraries which use this format, e.g. https://github.com/alexedwards/argon2id so I'll look into implementing it this way.

Feb 22 '21 15:02 johnmaguire

Just pushed another commit to store the encryption algorithm and KDF parameters in a protobuf message. Feel free to rebase before merge if preferred. This should offer more flexibility for supporting additional algorithms in the future (both because additional fields can be added to the message in a backwards compatible manner, and because the encryption algorithm itself is now stored.) This is my first time using protobufs though so if I'm missing something here, please let me know!

For some reason, running make proto updated the protobuf dependency. I included that in this commit, but I can try reverting that piece of the commit and check if it still builds, if preferred.

I did not update tests yet, hope to get around to it sometime tomorrow.

The main points of consideration at this point I think are:

Do the Argon2id parameters seem appropriate? As far as I can tell, this depends a bit on the use case.
Should it be necessary to pass -no-encryption to avoid the passphrase prompt (current state of the diff)? Or should non-interactive terminals automatically skip asking for a passphrase?

I will update the pull request description to clarify the current state of this diff.

Mar 07 '21 01:03 johnmaguire

Nice feature. This would be very helpful when enrolling users by hand and I'm sure in other use cases. A related use case would be to add integration with hashicorp vault to allow for password / CA rotation. The CA rotation might be harder to implement, but rotating the password that encrypts the CA at rest might be easier.

Of course this could probably be done via vault agent or other means right now.

Mar 07 '21 19:03 ieugen

Just added tests for the CLI commands themselves and rebased on master, so this should be ready for review.

The CA rotation might be harder to implement, but rotating the password that encrypts the CA at rest might be easier.

Interesting idea. I think rotating the password would only be valuable if you could ensure that no old copies of the encrypted (or decrypted) CA key exist. (edit to clarify - If any copies of the key exist, "rotating" the password is a little misleading, since you're adding a new passphrase to unlock the key, but the existing key(s) still work with old copies. I think that this has different security properties than truly rotating the CA certificate.)

A related use case would be to add integration with hashicorp vault to allow for password / CA rotation.

I wondered if you couldn't just use a plaintext private key and have Hashicorp Vault encrypt it at rest, foregoing this feature? I haven't used Vault before though so am not sure how one would go about this about this.

One other related consideration from the pull request description:

This pull request offers no way to convert an encrypted private key to a decrypted one or vice-versa. This ability could be desirable to in order to encrypt existing CA private keys, or to move a private key into another secret management system.

If these features were implemented, it would allow for rotation of the passphrase itself - though I am unsure of the value it provides for the reasons stated above. It may be safer to simply create new CAs - a node can trust multiple CAs at once making rotation a bit easier.

Mar 07 '21 19:03 johnmaguire

After talking over the pull request out-of-band with @nbrownus, I wanted to add some notes for followup:

I went searching for libraries using the Argon2 KDF with AES to validate the combination. I wanted to first mention though that the argon2 docs themselves reference it in relation to AES-256 encryption. Regardless, two libraries I looked at included libsodium and cryptography.io. Both libraries support swapping out various KDFs for each encryption algorithm they support.
- cryptography.io does not support Argon2id as a KDF as they only support algorithms OpenSSL has built-in. OpenSSL does not support Argon2id, apparently solely due to the fact that they currently lack any support for threads, and Argon2 users prefer to make use of its parallelism.
- libsodium has support for Argon2id as a KDF. They specifically offer some guidance on parameters for interactive desktop applications as well, suggesting a 5 second pause "if the password doesn't need to be entered more than once per session." This differs from the latest Argon2 draft RFC which suggests "0.5 seconds for backend/frontend server authentication and 3 seconds for hard-drive encryption." Basically, it is dependent on the latency requirements. Currently, Nebula only supports issuing one node certificate/keypair at a time, and depending on how frequent this operation is, I can see arguments supporting either case. Of course, if bulk issuance is necessary, a bulk interface of some sort may also make sense. I am thinking something like 1 second may be a safer option until there's a bottleneck? In any case, I plan to read closely over the guidelines in both the RFC and libsodium and make sure the diff is using sufficient default values.
- Examples of other projects which use this combination include a small project called FileCrypt, and a small library combining these Argon2 and AES for ergonomics.
As mentioned previously, this code does not allow the user overriding Argon2 parameters. Given some of the challenges around choosing generally applicable variables (as the nebula-cert sign client hardware dictates the speed of key derivation for the general use case, and it is neither clear what parameters are best for it without testing them, nor is the nebula-cert ca client necessarily the same as the nebula-cert sign client.) I am going to investigate allowing this override via environment variables. These would probably be named something like ARGON2_PARALLELISM, ARGON2_ITERATIONS, and ARGON2_MEMORY. Another option would be to use CLI flags instead, but I would like to avoid overloading the user with flags in the help text, especially if we can get these strong enough that we don't expect users to need to tweak them.
@nbrownus pointed out that the /x/crypto/ssh/terminal package has been moved to /x/term, so I will update the imports and go.mod to point to the new module.
Nate also made me aware of Oneof fields in protobufs, which may alleviate my concerns around the flexibility of the KeyDerivationParameters field in the message. I'm still looking into what this would look like, but it may be possible to avoid the encoding/decoding string parsing functions (and Sscanf, which has made headlines recently) in the crypto.go file

I did forget to ask, any preference on rolling back the protobuf go.mod update?

Mar 11 '21 02:03 johnmaguire

This is the best documented PR in the history of the project. I hope someday we merge it. :)

Mar 11 '21 15:03 rawdigits

Hah - I apologize for all the text. It's largely for my own documentation / note-taking as I work through this, as well as a hope that they could act as a weak form of documentation for others reviewing this code now or later. (I will admit - I've tried to git blame some code in the repo a few times and been stymied by "Slack Security Team" 😜)

I haven't made progress on this since my last comment, but I did find two other projects using Argon2 as a KDF for AES which I wanted to share my findings on -

LUKS2 is using Argon2i as a KDF by default (however, they use AES-XTS instead of AES-GCM as AES-GCM is not suitable for encrypting large amounts of data due to IV reuse.)
KeePass 4 is using Argon2d as a KDF (however, they use AES-CBC and authenticate the encrypted data via an HMAC-SHA-256.)

It is worth noting that while LUKS2 defaults to Argon2i, this default can be overridden at both compile-time and runtime. Furthermore, Argon2id is also offered as a KDF, but Argon2d is not. The following quote from the Argon2 RFC (currently rev. 13) is used to justify LUKS2's decision to utilize Argon2i as opposed to Argon2d:

Argon2 is a traditional memory-hard function [HARD]. It is a streamlined design. It aims at the highest memory filling rate and effective use of multiple computing units, while still providing defense against tradeoff attacks. Argon2 is optimized for the x86 architecture and exploits the cache and memory organization of the recent Intel and AMD processors. Argon2 has one primary variant: Argon2id and two supplementary variants: Argon2d and Argon2i. Argon2d uses data-dependent memory access, which makes it suitable for cryptocurrencies and proof-of-work applications with no threats from side-channel timing attacks. Argon2i uses data-independent memory access, which is preferred for password hashing and password- based key derivation. Argon2id works as Argon2i for the first half of the first pass over the memory, and as Argon2d for the rest, thus providing both side-channel attack protection and brute-force cost savings due to time-memory tradeoffs. Argon2i makes more passes over the memory to protect from tradeoff attacks [AB15].

In contrast, KeePass justifies their decision to use Argon2d:

Only the Argon2d variant of Argon2 is supported (a strong defense against GPU/ASIC cracking attacks is the most important goal, and Argon2d here is better than Argon2i; side-channel timing attacks are basically irrelevant, because KeePass is a local application, not a remote server).

Outside of this RFC, recommendations are made to use Argon2id in all circumstances unless you have a deep understanding of the tradeoffs and are sure you do not need the benefits of the other. Furthermore, the RFC gives recommendations for both Argon2d (specifically in the context of cryptocurrencies) and Argon2id, but offers no such recommendations for Argon2i.

In any case, Argon2id is resistant to both side-channel and GPU attacks, and I still believe its usage is appropriate in this pull request unless a crypto expert says otherwise.

As a data point, the default parameters for Argon2i as compiled into my Arch Linux copy of cryptsetup are:

Default PBKDF for LUKS2: argon2i
	Iteration time: 2000, Memory required: 1048576kB, Parallel threads: 4

KeePass instead suggests the following procedure for choosing parameters:

Set the number of iterations to 2.

Find out the RAM size of each of your devices on which you want to open your database file. Let M be the minimum of these sizes. Set the memory parameter to min(M/2, 1 GB) (i.e. use the half of M, if it is less than 1 GB, otherwise use 1 GB).

Example 1: if you have a PC with 32 GB RAM and a mobile phone with 1 GB RAM (on which you want to open your database file), set the memory parameter to 500 MB.

Example 2: if you have a PC with 32 GB RAM and a PC with 8 GB RAM, set the memory parameter to 1 GB.

On Windows 10, the RAM size can be found in the system settings → 'System' → 'About'.

Find out the number of logical processors of each of your devices. Set the parallelism parameter to the minimum of these numbers. On Windows 10, the number of logical processors can be found in the Task Manager (right-click onto the taskbar → 'Task Manager') on the 'Performance' tab page.

Click the 'Test' button.

If the key transformation takes too much time (longer than you are willing to wait when opening/saving the database file, e.g. more than 1 second), cancel it, decrease the memory parameter and click the 'Test' button again. Repeat this until the required time is acceptable.

If the key transformation takes too few time (in the case of 1 GB memory), increase the number of iterations and click the 'Test' button again. Repeat this until you like the required time.

Save the database file and try to open it on each of your other devices. If this takes too long on one of the devices, decrease the number of iterations (recommendation: not less than 2) and/or decrease the memory parameter, and try it again.

Mar 15 '21 21:03 johnmaguire

I updated /x/crypto/ssh/terminal import to /x/term in the cert package - left the existing reference in the sshd package alone to avoid scope screep in this pull request.

I performed some benchmarking on some devices I had readily available, the results of which can be seen here: https://gist.github.com/JohnMaguire/7a2f2be89b736a5b3d6323d0828eb140

If you check an earlier revision, you'll see ns/op. The current revision has been converted to s/op (though I didn't update the units, sorry!)

Small sample size of 2 laptops and a home NAS. If anyone would like to run these benchmarks themselves, please feel free. I ran them via go test -bench=. -benchtime=10x from inside the cert/ directory. I was going to run it on a cheap DigitalOcean droplet I have running too, but that server has less than 1 GiB of RAM, so none of these settings were applicable. I think this gets back to the "parameters need to be configurable" story, so I decided not to pursue that further.

My goal is to find a sane default that will work for most users from an ordinary laptop/workstation device. I am thinking that targeting roughly 2 seconds - 3 seconds may be a reasonable goal. Per the RFC...

The Argon2id variant with t=1 and 2GiB memory is FIRST RECOMMENDED option and is suggested as a default setting for all environments. This setting is secure against side-channel attacks and maximizes adversarial costs on dedicated bruteforce hardware. The Argon2id variant with t=3 and 64 MiB memory is SECOND RECOMMENDED option and is suggested as a default setting for memory-constrained environments.

However, t=1 (iterations = 1) does not attain a 2 - 3 second key derivation time without a very high parallelism factor. This seems somewhat wasteful, so I spent some time tweaking the iterations.

A subset of tests from the slowest device (AMD FX-4300, 4 threads) around the target range:

# 1 GiB, Parallelism = 4
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_4,_parallelism_=_4-4                    10        1.682848246 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_6,_parallelism_=_4-4                    10        2.417071884 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_4-4                    10        3.168717718 ns/op

# 1 GiB, Parallelism = 8
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_4,_parallelism_=_8-4                    10        1.737472683 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_6,_parallelism_=_8-4                    10        2.507825558 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_8-4                    10        3.409833027 ns/op

# 2 GiB, Parallelism = 4
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_3,_parallelism_=_4-4                    10        2.740222098 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_4,_parallelism_=_4-4                    10        3.410764828 ns/op

# 2 GiB, Parallelism = 8
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_3,_parallelism_=_8-4                    10        2.714449131 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_4,_parallelism_=_8-4                    10        3.493640198 ns/op

Let's see what those numbers look like on the fastest device (Intel i9-9880H, 16 threads)...

# 1 GiB, Parallelism = 4
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_4,_parallelism_=_4-16                   10         0.979919804 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_6,_parallelism_=_4-16                   10        1.493678136 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_4-16                   10        1.857725165 ns/op

# 1 GiB, Parallelism = 8
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_3,_parallelism_=_8-16                   10         0.484629244 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_4,_parallelism_=_8-16                   10         0.656521862 ns/op

# 2 GiB, Parallelism = 4
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_3,_parallelism_=_4-16                   10        1.5081269 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_4,_parallelism_=_4-16                   10        2.011072707 ns/op

# 2 GiB, Parallelism = 8
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_3,_parallelism_=_8-16                   10         0.969962374 ns/op
BenchmarkAes256DeriveKeyMatrix/memory_=_2097152,_iterations_=_4,_parallelism_=_8-16                   10        1.2796374 ns/op

It's worth noting at this point that increasing parallelism does two things: (a) It changes the resulting derived key, (b) It increases the number of threads required to compute a single hash, so it is helpful in avoiding parallelized hash cracking. [But on systems with more threads, it also decreases the time to compute a single hash.]

I am thinking that 1 GiB, parallelism = 4, iterations = 4 may be the sweet spot here...

# AMD FX-4300 (4) @ 3.800GHz
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_4-4                    10        3.168717718 ns/op

# Intel i7-7820HQ (8) @ 3.900GHz
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_4-8                    10        2.197212209 ns/op

# Intel i9-9880H (16) @ 2.30GHz
BenchmarkAes256DeriveKeyMatrix/memory_=_1048576,_iterations_=_8,_parallelism_=_4-16                   10        1.857725165 ns/op

I continue to be open to suggestions. Or a security review ;)

Still need to add a way to override the defaults and look into getting rid of the stringification of the Argon2 parameters.

Mar 25 '21 22:03 johnmaguire

Diff is updated with some default parameters mentioned by the Argon2 RFC - this seems likely to be the best recommendation for Argon2 at the moment.

Argon2 parameters can now be overridden. Encryption is opt-in. Pull request description is updated.

TODO: Un-stringify the KDF params and just use protobuf for serialization.

Apr 13 '21 20:04 johnmaguire

Argon2 KDF params are now serialized using protobuf fields. I think this is ready for review. I can imagine UX improvements that could be built on top of it (e.g. support for scripting a passphrase), but it's probably best to leave those for a smaller follow-up diff if this one ever merges. :)

Apr 13 '21 22:04 johnmaguire

Would it be possible to extend this to the client keys, not just the CA key? (Or, does this need to be a separate request, once this is merged?)

May 12 '21 13:05 JonTheNiceGuy

Would it be possible to extend this to the client keys, not just the CA key?

You're aware that this means the client needs to enter the password every time the application is started, right? This will require some workaround for running as a service. Perhaps reading the password from a file (something like how it's done on encrypted ZFS datasets) could be a feasible idea..

Jun 01 '21 00:06 SamSirry