lnd icon indicating copy to clipboard operation
lnd copied to clipboard

tls: Add ability to encrypt TLS key on disk

Open orbitalturtle opened this issue 3 years ago • 4 comments

This PR replaces #4476 . Since I've been making some of the changes for @gkrizek and it's been a bit tricky to manage making PRs into his PRs... we thought it would be nice to have a fresh start. :)

Depends on #6573

  • Rebased this on top of #6500. The first six commits are redundant... If #6500 goes through first, then this will be just a few commits
  • The other major change in this PR is refactoring the getTLSConfig method

Graham's original description:

Currently LND writes the TLS private key to disk in plaintext. If you are running in an untrusted environment this could be a problem as someone could use the key to snoop on your API traffic and steal credentials. This pull request creates a way to encrypt the TLS key on disk so even if your server is compromised or others have access, they can't spy on your traffic.

In these changes, we add a new flag to lnd called --tlsencryptkey. When this is enabled, lnd encrypts the TLS key using the wallet's seed and writes the encrypted blob to disk. Because the key is encrypted to the wallet's seed, that means we can only use the TLS pair when the wallet is unlocked. This would leave the WalletUnlocker service without TLS. To fix this problem, we create a temporary TLS pair specifically for the WalletUnlocker RPC. This temporary TLS key is only stored in memory and the temporary TLS cert is written to disk at the same path as tlscertpath, but with .tmp appended to the end of the filename. After unlocking, lnd deletes the cert from disk and clears the private key from memory. Then uses the main TLS certificate for all other RPCs.

Of course, this changes the flow of unlocking/init a little bit. If you use --tlsencryptkey on lnd, then you'll need a way to access the temporary TLS cert for verification. The simplest way to do this is to use lncli on the same host lnd is running on and point --tlscertpath at the temporary cert. If that's not possible then the user will need to come up with a way to copy the temporary certificate out of the lnd server and get it to their client device.

Summary of Changes:

  • Moves the crypto.go file from chanbackup into its own package called lnencrypt. This was needed because we can't directly reference the encryptPayloadToWriter and decryptPayloadFromReader functions from the chanbackup package due to circular dependency issues. Also, I think it's very likely that future enhancements will also want to use these same functions. Thus I've moved them to their own package. (cherry-picked from https://github.com/lightningnetwork/lnd/pull/4458)

  • Creates a new flag (--tlsencryptkey) that tells lnd to encrypt the TLS private key on disk. If set, it will generate temporary TLS certs for the WalletUnlocker RPC. Allows users to prevent spying on API traffic in untrusted environments.

  • Add necessary tests and documentation

orbitalturtle avatar May 11 '22 23:05 orbitalturtle

Looking into you PR :+1:.

Is this ready for review? The review starts from lnd: refactor getTLSConfig, right? Saw that an external repo is referenced, are the changes to lnd/cert tracked somewhere?

bitromortac avatar Sep 05 '22 15:09 bitromortac

Thanks @bitromortac! Yes this PR depends on this one: https://github.com/lightningnetwork/lnd/pull/6573, so might be worth giving that one a review first.

(Also noticed both PRs need a rebase. Will do that sometime today.)

orbitalturtle avatar Sep 06 '22 16:09 orbitalturtle

@sputn1ck: review reminder @bitromortac: review reminder

lightninglabs-deploy avatar Sep 13 '22 16:09 lightninglabs-deploy

Can be rebased and updated to the new version of the cert module, I pushed a new tag cert/v1.2.0.

guggero avatar Nov 02 '22 08:11 guggero

@sputn1ck: review reminder @bitromortac: review reminder @orbitalturtle, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Nov 10 '22 10:11 lightninglabs-deploy

Also, I'm not sure in general how this feature can be used in a secure way.

I can address this specifically because this is something I thought a lot about several years ago when I first made this PR. We handle this by delivering the ephemeral TLS cert out of band (through our site) for effective validation. Of course in our case the user has to sort of trust that we are giving them the right cert, but that's acceptable in our model. Generally speaking, the cert will always have to be delivered/validated out of band from LND.

I thought about adding in functionality like Webhook notifications or uploading the ephemeral TLS cert to a location into LND, but determined that was out of scope for LND. If a user is leveraging this then it's up to them to validate the ephemeral TLS cert just like they have to do with the persistent TLS cert. So I don't think it degrades security at all, but it does add some additional complexity for the user that will need to be solved for. (can easily be done with automation)

gkrizek avatar Nov 15 '22 18:11 gkrizek

I thought about adding in functionality like Webhook notifications or uploading the ephemeral TLS cert to a location into LND

Right, I now remember some of the discussions around webhooks and uploading the cert somewhere. Maybe some of the security tradeoffs and considerations (especially the "out of band" part) could be a bit more explicit in the doc. Currently it just mentions "If you aren't able to run lncli on the host lnd is running on, then you'll need to copy the temporary certificate from the host onto whatever device you're using." but doesn't really go into why it's important to verify the ephemeral cert specifically.

guggero avatar Nov 15 '22 19:11 guggero

@guggero Thanks for the review Oli! Yeah your feedback makes a lot of sense. In my initial refactor I was probably too conservative in trying not to break how things as they currently work, but makes sense that we needed to push the refactoring and clarification further. I pushed up a much deeper refactor, which hopefully makes the code much easier to read. LMK what you think & if this matches sort of what you were talking about in regards to a state machine?

I plan to spend some time running some more tests to make sure it's working as well as before (since there are so many different paths it could take), but let me know if this refactor is looking better.

orbitalturtle avatar Nov 18 '22 05:11 orbitalturtle

@bitromortac Yes, you understand that flow right. It is possible to start a wallet with --tlsencryptkey but I don't think it should be the default if that's what you're asking.

I think renaming the key to something different could cause confusion in my opinion. having different or multiple tls.key files depending on settings seems like it'd be hard to follow.

gkrizek avatar Nov 29 '22 21:11 gkrizek

Yes, you understand that flow right. It is possible to start a wallet with --tlsencryptkey but I don't think it should be the default if that's what you're asking.

Thanks @gkrizek. Currently we error for an empty .lnd with [ERR] LTND: Shutting down because error in main method: unable to load TLS credentials: open /home/user/.lnd/tls.cert: no such file or directory, so that would need to be created first before trying to encrypt it. Also I'm wondering for which applications we would need an unencrypted tls.key and why it's not a good idea to do this by default (and ensure backward compatibility with unlocking).

bitromortac avatar Nov 30 '22 09:11 bitromortac

@bitromortac

Currently we error for an empty .lnd with [ERR] LTND: Shutting down because error in main method: unable to load TLS credentials: open /home/user/.lnd/tls.cert: no such file or directory, so that would need to be created first before trying to encrypt it.

I'm not sure I understand what you're saying. Can you elaborate here?

Also I'm wondering for which applications we would need an unencrypted tls.key and why it's not a good idea to do this by default (and ensure backward compatibility with unlocking).

I don't think there's really a problem or place where having an encrypted TLS key is bad, however the flow to unlock the node (with the temporary certificates) is a pretty different process than what has existed in LND so far. So I think forcing this new process for unlocking nodes would have negative cascading affects.

Thanks!

gkrizek avatar Dec 03 '22 01:12 gkrizek

Addressed those comments @guggero! Just one question: Looks like the linter wants me to add a timeout to the lets encrypt codes. Would it make sense to use one of the other timeouts in the config.go or should I set a new one?

And uh, I don't know why the GitHub UI hates me, but every time I go to request a review from anyone, it for some reason removes other developers' requests for review (???). So sorry for removing your request for review @sputn1ck.

orbitalturtle avatar Dec 03 '22 23:12 orbitalturtle

Looks like the linter wants me to add a timeout to the lets encrypt codes. Would it make sense to use one of the other timeouts in the config.go or should I set a new one?

I'd just add a var at the top of the file for this case and set it to 5 seconds or something like that. The linter is new so it didn't catch existing instances of http.Server that we should probably also address. I'm going to add an issue for that so we can fix it everywhere (REST server, profiling server, let's encrypt).

guggero avatar Dec 05 '22 08:12 guggero

I'm not sure I understand what you're saying. Can you elaborate here?

@gkrizek Yes, sorry for not being more explicit. The code as in this PR, running lnd for the first time with the --tlsencryptkey flag is not compatible with an uninitialized .lnd home folder, as the flow expects a tls.cert/keypair in the first call of GetConfig to exist and errors with a file not found error. Is that intentional or a deeper reason behind it?

bitromortac avatar Dec 05 '22 11:12 bitromortac

@bitromortac maybe @guggero can help us decide the default behavior. Although this is more secure, I do not think it should be the default behavior because it changes the way nodes are unlocked significantly. I think this new and different way of unlocking would be bad for existing LND users. The preference would be to allow them to upgrade when they are ready.

Would it be too involved to keep tls.cert in sync with the currently active certificate? That way we would keep backward compatibility with unlocking.

I think this is true only if the user is using lncli from the same instance that LND is running on. If they are communicating with the node remotely it wouldn't provide any benefits. You could still need to grab the cert to validate the request. Therefore, I think overwriting the tls.cert file back and forth doesn't make a lot of sense and could cause confusion.

gkrizek avatar Dec 06 '22 18:12 gkrizek

I'm not sure I understand what you're saying. Can you elaborate here?

@gkrizek Yes, sorry for not being more explicit. The code as in this PR, running lnd for the first time with the --tlsencryptkey flag is not compatible with an uninitialized .lnd home folder, as the flow expects a tls.cert/keypair in the first call of GetConfig to exist and errors with a file not found error. Is that intentional or a deeper reason behind it?

Oops yup looks like that was a bug. Nice catch. I also addressed all your other comments @bitromortac, when you have a chance to take a look. Good call on encapsulating the lnd.go further into the TLSManager.

orbitalturtle avatar Dec 06 '22 21:12 orbitalturtle

@bitromortac maybe @guggero can help us decide the default behavior. Although this is more secure, I do not think it should be the default behavior because it changes the way nodes are unlocked significantly.

I agree with Graham here, we shouldn't make this the default because it will make unlocking through lncli quite different because another cert needs to be used.

guggero avatar Dec 07 '22 11:12 guggero

I think this is true only if the user is using lncli from the same instance that LND is running on. If they are communicating with the node remotely it wouldn't provide any benefits. You could still need to grab the cert to validate the request. Therefore, I think overwriting the tls.cert file back and forth doesn't make a lot of sense and could cause confusion.

I agree with Graham here, we shouldn't make this the default because it will make unlocking through lncli quite different because another cert needs to be used.

Yes, that makes sense. I didn't really consider that scenario, usually when I'm restarting my node I unlock it from the same machine, but there is a use case where the node hoster is different from the manager :+1: (which is also part of the motivation of this PR).

bitromortac avatar Dec 07 '22 14:12 bitromortac

@bitromortac Sounds great, made those changes :+1: @guggero & @sputn1ck lmk if there's anything else I can do to improve the PR. I'd re-request a review above, but having trouble with the UI lately.

orbitalturtle avatar Dec 16 '22 04:12 orbitalturtle

@orbitalturtle, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Jan 13 '23 16:01 lightninglabs-deploy

Thanks for the reviews @bitromortac and @sputn1ck. @bitromortac I made those changes ~ re: a couple of your comments, I think I had done something funny when rebasing, but should be fixed now :+1:

orbitalturtle avatar Jan 19 '23 19:01 orbitalturtle

Fabulous, @bitromortac I made those changes, thanks again for the reviews :)

orbitalturtle avatar Jan 25 '23 19:01 orbitalturtle

Great made those changes @guggero, and rebased

orbitalturtle avatar Jan 26 '23 19:01 orbitalturtle

Great made those changes @guggero, and rebased

I don't really see the changes. But I also noticed that the same area of the code was replaced in a later commit, so it's fixed indirectly. Not optimal, but given the number of rounds this has gone through, I'm just going to merge anyway.

guggero avatar Jan 27 '23 13:01 guggero