vault icon indicating copy to clipboard operation
vault copied to clipboard

Feature Request: Support multiple KMS Keys for Auto Unseal

Open justyns opened this issue 6 years ago • 40 comments

Is your feature request related to a problem? Please describe.

We need to plan for a scenario where someone accidentally deletes a KMS key, or KMS itself is inaccessible in a region. I've tried the following as a simple test:

  • Create consul and vault clusters in two regions, region1 and region2
  • Enable Auto Unseal using awskms with a KMS key in region1
  • Take snapshots of consul from region1
  • Stop clusters in region1 and disable/delete KMS key in region1
  • Restore consul snapshot from region1 in region2
  • Attempt to start Vault

This fails because the KMS key is disabled/deleted/inaccessible from the new cluster.

Essentially I want the ability to take a consul snapshot and restore it on a server that does not have access to the KMS key used by the vault cluster with auto-unseal using awskms enabled.

Describe the solution you'd like

I think supporting multiple seal blocks or kms ids in the awskms seal block may be the simplest solution. This could look something like this:

seal "awskms" {
  region     = "us-east-1"
  kms_key_ids = [
       "region1/19ec80b0-dfdd-4d97-8164-c6examplekey",
       "region2/12e4e120-dfdd-4d97-8164-c6examplekey2"
  ]
}

In my example, I would specify KMS keys from two (or more) regions. Vault would then encrypt the master key against every KMS key in the list.

I would then be able to take a consul snapshot in region1, create a consul/vault cluster in region2, restore the snapshot, bring up the vault cluster and have it auto unseal as normal.

If supporting multiple kms keys wouldn't work for some reason, then I think allowing the original unseal keys (recovery keys) to unseal a vault cluster would also be a reasonable solution.

Describe alternatives you've considered

  • Enterprise DR would essentially solve the issue, but that is a long-term solution and not available when only using the OSS version of Vault.
  • Originally I was under the impression that the "recovery keys" could be used to unseal a vault cluster that had auto unseal enabled. This isn't the case, and makes their name seem misleading unless they serve a purpose other than generating root tokens or removing auto unseal (while auto unseal is still enabled and working.)
  • Another option considered was to import custom key material into a new KMS key. Because the old encrypted data references the original KMS key id, this doesn't work when that key no longer exists.

Explain any additional use-cases

Without this feature, it is impossible afaict to properly backup a Vault cluster with Auto Unseal enabled. Even with Enterprise DR and replication, you wouldn't have a true backup as you can't back up the KMS keys.

Please let me know if I am misunderstanding something or if there is an alternative solution. Thanks!

justyns avatar Jan 15 '19 19:01 justyns

Why can't you point your region2 at your region1 KMS keys? There are certain things that make this unlikely to implement, such as key rotation and Seal Wrap (in enterprise).

chrishoffman avatar Jan 23 '19 13:01 chrishoffman

Even with Enterprise DR and replication, you wouldn't have a true backup as you can't back up the KMS keys.

Enterprise DR can use a different key per cluster, which does address this issue (and in fact is best practice).

jefferai avatar Jan 23 '19 13:01 jefferai

Why can't you point your region2 at your region1 KMS keys? There are certain things that make this unlikely to implement, such as key rotation and Seal Wrap (in enterprise).

The idea was to plan for a DR event that could include the region1 KMS keys being unavailable for whatever reason.

Even with Enterprise DR and replication, you wouldn't have a true backup as you can't back up the KMS keys.

Enterprise DR can use a different key per cluster, which does address this issue (and in fact is best practice).

Using Enterprise DR, is there a way to take a backup of the Vault backend data and restore it somewhere that does not have access to KMS or whichever auto-unseal method is being used? My understanding is that this isn't possible and is what I meant by that statement. You're right though that having an Enterprise DR cluster would address this issue for most companies using Enterprise Vault.

I would still like to see a way of addressing this for the OSS version, or even those using Enterprise but not willing/able to run a DR cluster.

justyns avatar Jan 30 '19 16:01 justyns

+1

xynova avatar Apr 01 '19 21:04 xynova

+1

rohit8925 avatar Sep 04 '19 08:09 rohit8925

I am also trying to achieve similar objective, where i enabled backup for vault-backend mysql in Region-1 and tried restoring on another mysql instance on Region-2. If for some reason region-1 is down and KMS Key-id is unavailable from Region-1, auto-unseal won't help, and in that case how to unseal Vault, so as to restore services. Also, I want to understand the usage of Recovery-keys. I agree with @justyns , it would be good to have multiple KMS keys for Auto unseal.

rohit8925 avatar Sep 04 '19 08:09 rohit8925

+1

chris-ng-1987 avatar Sep 04 '19 08:09 chris-ng-1987

Hit this exact same thought process today:

Originally I was under the impression that the "recovery keys" could be used to unseal a vault cluster that had auto unseal enabled. This isn't the case, and makes their name seem misleading unless they serve a purpose other than generating root tokens or removing auto unseal (while auto unseal is still enabled and working.)

I was under the exact same impression, then I went to test it and hit the "invalid key" error trying to unseal with the recovery keys. I then found https://groups.google.com/forum/#!msg/vault-tool/-gdDm-KRlxw/4b6t0QnaAgAJ which confirmed a suspicion I had after my testing, which led me here.

  • Another option considered was to import custom key material into a new KMS key. Because the old encrypted data references the original KMS key id, this doesn't work when that key no longer exists.

I considered this idea, interesting I didn't realise it references the KMS Key ID directly, what a PITA... so you couldn't just import the same key material into a new key in either the same or a different AWS region, and have it unseal I guess...?

CpuID avatar Oct 11 '19 08:10 CpuID

@ncabatoff as a followup to comments in https://github.com/hashicorp/vault/pull/7559 - you are right, would need to store the master key encrypted multiple times, in KMS + somewhere else potentially.

personally I'd prefer a way to use unseal shamir shares to do it as one method for recovery, to allow for "break glass" type situations where you take a last known good backup from your storage backend, load it up on locally (laptop or whatever) and unseal it to obtain whatever is required.

using multiple KMS keys is one approach, would at least cover off a single AWS region going away. using shamir shares also would handle the situation of AWS dropping off entirely (global network misconfiguration etc), where you can't get to KMS anywhere.

CpuID avatar Oct 11 '19 13:10 CpuID

Right now we believe our workaround is:

  • backup our consul
  • restore it
  • set seal.awskms.disabled: true
  • launch vault, connected to restored consul
  • run vault operator unseal -migrate -- multiple times, passing in the recovery keys
  • take another backup of consul

We now have a consul backup that is not sealed using KMS and can be unsealed with the recovery keys.

Then we:

  • Restore consul in new region
  • Spin up vault without awskms disabled
  • Run vault operator unseal -migrate again, passing in the recovery keys again.
  • Restart vault now that KMS unsealing is back on.

Yea, its a lot of work, but I don't know a current workaround otherwise. We simply cannot have our backends KMS sealed or we cannot unseal them in the event of a KMS or regional outage.

tecnobrat avatar Nov 19 '19 23:11 tecnobrat

@vishalnayak @calvn @jefferai based on community feedback so far above, any chance of this getting some cycles in the near future? :) thanks!

CpuID avatar Jan 23 '20 22:01 CpuID

Right now we believe our workaround is:

  • backup our consul
  • restore it
  • set seal.awskms.disabled: true
  • launch vault, connected to restored consul
  • run vault operator unseal -migrate -- multiple times, passing in the recovery keys
  • take another backup of consul

We now have a consul backup that is not sealed using KMS and can be unsealed with the recovery keys.

Then we:

  • Restore consul in new region
  • Spin up vault without awskms disabled
  • Run vault operator unseal -migrate again, passing in the recovery keys again.
  • Restart vault now that KMS unsealing is back on.

Yea, its a lot of work, but I don't know a current workaround otherwise. We simply cannot have our backends KMS sealed or we cannot unseal them in the event of a KMS or regional outage.

@tecnobrat I tried procedure you sent but getting

  • failed to decrypt encrypted stored keys: error decrypting data encryption key during unseal procedure on DR site. It still needs access to the KMS.

perorope avatar Feb 17 '20 18:02 perorope

As far as I know, 'vault operator unseal -migrate' will work till the time KMS is accessible. If you disconnect KMS, -migrate won't work. As vault need access to active KMS even to migrate to new. At least there should be a provision to provide unseal keys when KMS is not available. In Kubernetes, Pods won't even start if there is any issue with KMS.

techs07 avatar May 26 '20 13:05 techs07

While multiple keys would be useful and worth implementing, I too would like to see recovery keys work for unseal if the configured auto-unseal mechanism is not available in an emergency.

personally I'd prefer a way to use unseal shamir shares to do it as one method for recovery, to allow for "break glass" type situations where you take a last known good backup from your storage backend, load it up on locally (laptop or whatever) and unseal it to obtain whatever is required.

stevenscg avatar May 26 '20 13:05 stevenscg

I have created just new KMS key and stored it in configuration. Vault unseal like a charm.... My backend is MySQL Vault 1.3.7

yevgeniyo-ps avatar Aug 06 '20 10:08 yevgeniyo-ps

I also think we should have a way to recover Vault OSS using the recovery Shamir keys. I find this is not clear from the documentation that they can't be used anymore for unsealing as they are called recovery keys.

We found out about this limitation after a backup recovery firedrill in a clean/segregated cluster where Vault expected to to interact with the production KMS Key.

In case of any issue with KMS, it means we have no way to recover neither the cluster, nor any of its backup, which could be catastrophic. We could work around this by using a user provided KMS key, but this also brings its own set of problems about where to store the original key, who should have access to it and incurring downtime when we need to rotate it (KMS migrate -> Shamir migrate -> KMS).

glavoie avatar Aug 18 '20 13:08 glavoie

I have created just new KMS key and stored it in configuration. Vault unseal like a charm.... My backend is MySQL Vault 1.3.7

@yevgeniyo can you elaborate on this? I tried a restore test and created another AWS KMS key with external key material (so with the same secret) but was unable to unseal Vault with it. I updated the key id and access/secret key in the configuration but it's not working as expected :-(

[WARN] failed to unseal core: error="fetching stored unseal keys failed: failed to encrypt keys for storage: error decrypting data encryption key: AccessDeniedException: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

.... testing with the AWS commandline whether the credentials are working looks ok. Any hint?

akurz avatar Nov 27 '20 16:11 akurz

We could work around this by using a user provided KMS key

@glavoie that was my original assumptions, but no: a different key with the same user provided key material IS NOT capable of decrypting aws kms encrypted secrets. Only exactly the same CMK (the key with the same key id) can do that. In other words: there is absolutely no workaround for restoring from backup a vault instance that used aws kms autoseal. If you lose aws kms key (through deletion or other matter) - then there is absolutely no way to restore from backup, even if you still have user provided key material. Theoretically, one could have reverse engineered (good luck?) the encryption mechanism AWS KMS uses, but meh.

I'm here from https://discuss.hashicorp.com/t/switching-to-different-aws-kms-key-id-with-the-same-key-material/19116/22 and it was my original assumption as well.

zerkms avatar Dec 22 '20 21:12 zerkms

Hi,

In Vault enterprise DR solution, when KMS key is used for auto-unseal, does same KMS key is used for Vault backend data encryption in both DR sites? If yes, Is there possibility to set different keys for different sites? So, if there is a problem with KMS key, that it's still possible to unseal backend on secondary site.

tmiroslav avatar Dec 24 '20 23:12 tmiroslav

Any news on this? Ability to being able to have Shamir as fallback when the KMS key gets destroyed seems like a paramount capability when things go south.

ahjohannessen avatar Feb 05 '21 10:02 ahjohannessen

... to have Shamir as fallback when the KMS key gets destroyed seems like a paramount capability...

Please refrain from profanity. Of course it's important: that's why Enterprise customers have a number of options available.

jlj77 avatar Feb 05 '21 10:02 jlj77

Please refrain from profanity.

Sure, sorry about that.

Of course it's important: that's why Enterprise customers have a number of options available.

Hopefully something like support for Shamir fallback is considered regardless of OSS or Enterprise.

ahjohannessen avatar Feb 05 '21 11:02 ahjohannessen

@jlj77

Of course it's important: that's why Enterprise customers have a number of options available.

Would you mind elaborating a bit more about Enterprise options re auto-unseal when original AWS KMS key is not accessible?

Also maybe some can give a clue re how do we restore from the snapshot on another cluster which used its own AWS KMS to unseal? The issue is that when restoring from snapshot Vault attempts to unseal using AWS KMS key of the original cluster (the one that the snapshot was created from) and obviously fails since this AWS KMS key is not accessible.

yermulnik avatar Feb 17 '21 12:02 yermulnik

Would you mind elaborating a bit more about Enterprise options re auto-unseal when original AWS KMS key is not accessible?

My apologies. I didn't mean to suggest that Enterprise customers had other options related to this specific scenario; only that DR obviates the need for this in many cases.

... The issue is that when restoring from snapshot Vault attempts to unseal using AWS KMS key of the original cluster...

I'm assuming that snapshot-force is no help in this scenario, yes?

jlj77 avatar Feb 17 '21 22:02 jlj77

I'm assuming that snapshot-force is no help in this scenario, yes?

Yep, it just bypasses checks and ignores the warning.

yermulnik avatar Feb 17 '21 22:02 yermulnik

So far I'm reading all this like: "Do NOT use KMS Auto Unseal with Vault OSS unless you are fine with not having ability to backup your cluster (which in fact makes it useless in production environments)"

The same as some other participants of this issue I've gone trough this path of try-and-fail during DR testing like:

  • try using recovery keys for unsealing when KMS key isn't available [fail]
  • try using KMS CMK with external key and recover by disabling old key and creating new one with the same key material [fail]

As I wasn't able to find cautions in the docs such things aren't gonna work + terms like "recovery keys" is kinda misleading in this situation + it's not obvious that KMS key id can't be changed even if we recreate it with the same key material, maybe it's worth putting such cautions into Vault docs?

I'm not a cybersecurity expert, might anybody explain what's the point in "hardcoding" KMS key id when in fact we just need key material to decrypt master key (if I understand correctly how it works)?

klebediev avatar Mar 21 '21 07:03 klebediev

what's the point in "hardcoding" KMS key id when in fact we just need key material to decrypt master key (if I understand correctly how it works)?

It's not done by Vault, it's a native AWS Envelope Encryption, so hashicorp engineers just use primitives provided by the AWS KMS.

zerkms avatar Mar 21 '21 08:03 zerkms

Another problem: Vault isn't able to detect that KMS key deletion is scheduled (and during this period which is from 7 - to 30 days in case of AWS KMS we may cancel deletion) unless it restarts. So, if key deletion is scheduled but vault doesn't restart for 30 days => next restart we'll get surprise.

This might be helpful if Vault continuously checks whether the key is available and responds somehow in case of unavailability ranging from reflecting this in some metrics value (/v1/sys/metrics) to sealing the node (this can be added as an option to seal "awskms" block; personally I'd prefer the latter)

klebediev avatar Mar 21 '21 08:03 klebediev

It's not done by Vault, it's a native AWS Envelope Encryption, so hashicorp engineers just use primitives provided by the AWS KMS.

thanks @zerkms ! A little correction: as far as I understand in context of KMS it's called cyphertext encryption, not envelope encryption. An illustration why a key with different id but with the same key material can't be used for decrypting secret:

$ printf "myMasterPassword123" > masterkey
$ aws kms encrypt \
>     --key-id alias/abc \
>     --plaintext fileb://masterkey \
>     --output text \
>     --query CiphertextBlob | base64 \
>     --decode > masterkey.enc

$ aws kms decrypt \
>     --ciphertext-blob fileb://masterkey.enc \
>     --key-id alias/abc \
>     --output text \
>     --query Plaintext | base64 --decode
myMasterPassword123

$ aws kms decrypt \
>     --ciphertext-blob fileb://masterkey.enc \
>     --key-id alias/abc-restored \
>     --output text \
>     --query Plaintext | base64 --decode
An error occurred (IncorrectKeyException) when calling the Decrypt operation: The key ID in the request does not identify a CMK that can perform this operation.

klebediev avatar Mar 22 '21 07:03 klebediev

Next question: why KMS Auto-unseal recovery keys can't be used for emergency unsealing when KMS CMK isn't available?

klebediev avatar Mar 22 '21 07:03 klebediev