lnd
lnd copied to clipboard
tls: Add ability to encrypt TLS key on disk
This PR replaces #4476 . Since I've been making some of the changes for @gkrizek and it's been a bit tricky to manage making PRs into his PRs... we thought it would be nice to have a fresh start. :)
Depends on #6573
- Rebased this on top of #6500. The first six commits are redundant... If #6500 goes through first, then this will be just a few commits
- The other major change in this PR is refactoring the getTLSConfig method
Graham's original description:
Currently LND writes the TLS private key to disk in plaintext. If you are running in an untrusted environment this could be a problem as someone could use the key to snoop on your API traffic and steal credentials. This pull request creates a way to encrypt the TLS key on disk so even if your server is compromised or others have access, they can't spy on your traffic.
In these changes, we add a new flag to lnd called --tlsencryptkey. When this is enabled, lnd encrypts the TLS key using the wallet's seed and writes the encrypted blob to disk. Because the key is encrypted to the wallet's seed, that means we can only use the TLS pair when the wallet is unlocked. This would leave the WalletUnlocker service without TLS. To fix this problem, we create a temporary TLS pair specifically for the WalletUnlocker RPC. This temporary TLS key is only stored in memory and the temporary TLS cert is written to disk at the same path as tlscertpath, but with .tmp appended to the end of the filename. After unlocking, lnd deletes the cert from disk and clears the private key from memory. Then uses the main TLS certificate for all other RPCs.
Of course, this changes the flow of unlocking/init a little bit. If you use --tlsencryptkey on lnd, then you'll need a way to access the temporary TLS cert for verification. The simplest way to do this is to use lncli on the same host lnd is running on and point --tlscertpath at the temporary cert. If that's not possible then the user will need to come up with a way to copy the temporary certificate out of the lnd server and get it to their client device.
Summary of Changes:
-
Moves the crypto.go file from chanbackup into its own package called lnencrypt. This was needed because we can't directly reference the encryptPayloadToWriter and decryptPayloadFromReader functions from the chanbackup package due to circular dependency issues. Also, I think it's very likely that future enhancements will also want to use these same functions. Thus I've moved them to their own package. (cherry-picked from https://github.com/lightningnetwork/lnd/pull/4458)
-
Creates a new flag (--tlsencryptkey) that tells lnd to encrypt the TLS private key on disk. If set, it will generate temporary TLS certs for the WalletUnlocker RPC. Allows users to prevent spying on API traffic in untrusted environments.
-
Add necessary tests and documentation
Looking into you PR :+1:.
Is this ready for review? The review starts from lnd: refactor getTLSConfig, right?
Saw that an external repo is referenced, are the changes to lnd/cert tracked somewhere?
Thanks @bitromortac! Yes this PR depends on this one: https://github.com/lightningnetwork/lnd/pull/6573, so might be worth giving that one a review first.
(Also noticed both PRs need a rebase. Will do that sometime today.)
@sputn1ck: review reminder @bitromortac: review reminder
Can be rebased and updated to the new version of the cert module, I pushed a new tag cert/v1.2.0.
@sputn1ck: review reminder @bitromortac: review reminder @orbitalturtle, remember to re-request review from reviewers when ready
Also, I'm not sure in general how this feature can be used in a secure way.
I can address this specifically because this is something I thought a lot about several years ago when I first made this PR. We handle this by delivering the ephemeral TLS cert out of band (through our site) for effective validation. Of course in our case the user has to sort of trust that we are giving them the right cert, but that's acceptable in our model. Generally speaking, the cert will always have to be delivered/validated out of band from LND.
I thought about adding in functionality like Webhook notifications or uploading the ephemeral TLS cert to a location into LND, but determined that was out of scope for LND. If a user is leveraging this then it's up to them to validate the ephemeral TLS cert just like they have to do with the persistent TLS cert. So I don't think it degrades security at all, but it does add some additional complexity for the user that will need to be solved for. (can easily be done with automation)
I thought about adding in functionality like Webhook notifications or uploading the ephemeral TLS cert to a location into LND
Right, I now remember some of the discussions around webhooks and uploading the cert somewhere.
Maybe some of the security tradeoffs and considerations (especially the "out of band" part) could be a bit more explicit in the doc. Currently it just mentions "If you aren't able to run lncli on the host lnd is running on, then you'll need to copy the temporary certificate from the host onto whatever device you're using." but doesn't really go into why it's important to verify the ephemeral cert specifically.
@guggero Thanks for the review Oli! Yeah your feedback makes a lot of sense. In my initial refactor I was probably too conservative in trying not to break how things as they currently work, but makes sense that we needed to push the refactoring and clarification further. I pushed up a much deeper refactor, which hopefully makes the code much easier to read. LMK what you think & if this matches sort of what you were talking about in regards to a state machine?
I plan to spend some time running some more tests to make sure it's working as well as before (since there are so many different paths it could take), but let me know if this refactor is looking better.
@bitromortac Yes, you understand that flow right. It is possible to start a wallet with --tlsencryptkey but I don't think it should be the default if that's what you're asking.
I think renaming the key to something different could cause confusion in my opinion. having different or multiple tls.key files depending on settings seems like it'd be hard to follow.
Yes, you understand that flow right. It is possible to start a wallet with --tlsencryptkey but I don't think it should be the default if that's what you're asking.
Thanks @gkrizek.
Currently we error for an empty .lnd with [ERR] LTND: Shutting down because error in main method: unable to load TLS credentials: open /home/user/.lnd/tls.cert: no such file or directory, so that would need to be created first before trying to encrypt it.
Also I'm wondering for which applications we would need an unencrypted tls.key and why it's not a good idea to do this by default (and ensure backward compatibility with unlocking).
@bitromortac
Currently we error for an empty .lnd with [ERR] LTND: Shutting down because error in main method: unable to load TLS credentials: open /home/user/.lnd/tls.cert: no such file or directory, so that would need to be created first before trying to encrypt it.
I'm not sure I understand what you're saying. Can you elaborate here?
Also I'm wondering for which applications we would need an unencrypted tls.key and why it's not a good idea to do this by default (and ensure backward compatibility with unlocking).
I don't think there's really a problem or place where having an encrypted TLS key is bad, however the flow to unlock the node (with the temporary certificates) is a pretty different process than what has existed in LND so far. So I think forcing this new process for unlocking nodes would have negative cascading affects.
Thanks!
Addressed those comments @guggero! Just one question: Looks like the linter wants me to add a timeout to the lets encrypt codes. Would it make sense to use one of the other timeouts in the config.go or should I set a new one?
And uh, I don't know why the GitHub UI hates me, but every time I go to request a review from anyone, it for some reason removes other developers' requests for review (???). So sorry for removing your request for review @sputn1ck.
Looks like the linter wants me to add a timeout to the lets encrypt codes. Would it make sense to use one of the other timeouts in the config.go or should I set a new one?
I'd just add a var at the top of the file for this case and set it to 5 seconds or something like that. The linter is new so it didn't catch existing instances of http.Server that we should probably also address. I'm going to add an issue for that so we can fix it everywhere (REST server, profiling server, let's encrypt).
I'm not sure I understand what you're saying. Can you elaborate here?
@gkrizek Yes, sorry for not being more explicit. The code as in this PR, running lnd for the first time with the --tlsencryptkey flag is not compatible with an uninitialized .lnd home folder, as the flow expects a tls.cert/keypair in the first call of GetConfig to exist and errors with a file not found error. Is that intentional or a deeper reason behind it?
@bitromortac maybe @guggero can help us decide the default behavior. Although this is more secure, I do not think it should be the default behavior because it changes the way nodes are unlocked significantly. I think this new and different way of unlocking would be bad for existing LND users. The preference would be to allow them to upgrade when they are ready.
Would it be too involved to keep tls.cert in sync with the currently active certificate? That way we would keep backward compatibility with unlocking.
I think this is true only if the user is using lncli from the same instance that LND is running on. If they are communicating with the node remotely it wouldn't provide any benefits. You could still need to grab the cert to validate the request. Therefore, I think overwriting the tls.cert file back and forth doesn't make a lot of sense and could cause confusion.
I'm not sure I understand what you're saying. Can you elaborate here?
@gkrizek Yes, sorry for not being more explicit. The code as in this PR, running lnd for the first time with the
--tlsencryptkeyflag is not compatible with an uninitialized.lndhome folder, as the flow expects atls.cert/keypair in the first call ofGetConfigto exist and errors with a file not found error. Is that intentional or a deeper reason behind it?
Oops yup looks like that was a bug. Nice catch. I also addressed all your other comments @bitromortac, when you have a chance to take a look. Good call on encapsulating the lnd.go further into the TLSManager.
@bitromortac maybe @guggero can help us decide the default behavior. Although this is more secure, I do not think it should be the default behavior because it changes the way nodes are unlocked significantly.
I agree with Graham here, we shouldn't make this the default because it will make unlocking through lncli quite different because another cert needs to be used.
I think this is true only if the user is using lncli from the same instance that LND is running on. If they are communicating with the node remotely it wouldn't provide any benefits. You could still need to grab the cert to validate the request. Therefore, I think overwriting the tls.cert file back and forth doesn't make a lot of sense and could cause confusion.
I agree with Graham here, we shouldn't make this the default because it will make unlocking through
lncliquite different because another cert needs to be used.
Yes, that makes sense. I didn't really consider that scenario, usually when I'm restarting my node I unlock it from the same machine, but there is a use case where the node hoster is different from the manager :+1: (which is also part of the motivation of this PR).
@bitromortac Sounds great, made those changes :+1: @guggero & @sputn1ck lmk if there's anything else I can do to improve the PR. I'd re-request a review above, but having trouble with the UI lately.
@orbitalturtle, remember to re-request review from reviewers when ready
Thanks for the reviews @bitromortac and @sputn1ck. @bitromortac I made those changes ~ re: a couple of your comments, I think I had done something funny when rebasing, but should be fixed now :+1:
Fabulous, @bitromortac I made those changes, thanks again for the reviews :)
Great made those changes @guggero, and rebased
Great made those changes @guggero, and rebased
I don't really see the changes. But I also noticed that the same area of the code was replaced in a later commit, so it's fixed indirectly. Not optimal, but given the number of rounds this has gone through, I'm just going to merge anyway.