lakeFS
lakeFS copied to clipboard
Introduce support of GCS encryption for both CMEK and CSEK
Closes #(Insert issue number closed by this PR)
Change Description
Background
Provide the support of GCS encryption for both CMEK and CSEK
New Feature
Issue link: https://github.com/treeverse/lakeFS/issues/7557
- Configuration of CMEK supported
- Configuration of CSEK supported
- Throw an error for generating PreSignedURL of GCS with CSEK as the user must have the key in the configuration file
- Frontend change is required
- If user must have the key in configuration, then the CSEK make not much sense for PreSignedURL
Testing Details
How were the changes tested?
- CMEK and CSEK cannot be configured at the same time, or the server will fail to start
- server will failed to start if CSEK is not a valid AES256 32bytes value
- CSEK encrypted object will fail to read when CSEK is not configured or wrong key
- CMEK encrypted object will fail to read when CMEK is not configured or wrong key
- Upload the file to a CMEK enabled bucket with CMEK configuration will be success
- Check the content of the CMEK encrypted file with CMEK configured
- Upload the file with CSEK will be success
- Check the content of the CSEK encrypted file with CSEK configured
- When the CSEK is configured and PreSignedURL is enabled, generating a PreSignedURL of a CSEK encrypted object is failed
Breaking Change?
Does this change break any existing functionality? (API, CLI, Clients)
No breaking change as there's no API changed, only the server configuration is required
Additional info
Logs, outputs, screenshots of changes if applicable (CLI / GUI changes)
Contact Details
By GitHub account
Hey @emulatorchen , thank you for your contribution! I should of reviewed sooner, it took me a bit longer since I needed to ramp up my gcp/kms knowledge. @Jonathan-Rosenberg and myself were debating some CSEK/CMEK questions. As we're both not very familiar with gcp, I wonder if you could help us with answering the below:
- What's the flow for key rotations for anyone of the 2 settings? We're obviously very worried about user's data that becomes inaccessible due to a rotation taking place, but it's not just data. lakeFS stores immutable committed metadata in the object store using the same gs adapter we're modifying, so a rotation that goes wrong could fail lakeFS completely.
- Can the user set CSEK/CMEK after some time of using lakeFS? Will the previous saved objects be accessible.
- Would you say it's a lakeFS-level setting? From what we understand, it's common for different gs buckets to have different encryption keys. A single lakeFS installation can manage multiple repositories across different buckets. What's the additional value lakeFS give to the user? It seems like we're forcing a single encryption key for all the repos. Isn't the user better off setting this in the bucket level and keep lakeFS unaware to encryption (again - not sure if that's possible).
Hey @emulatorchen , thank you for your contribution! I should of reviewed sooner, it took me a bit longer since I needed to ramp up my gcp/kms knowledge. @Jonathan-Rosenberg and myself were debating some CSEK/CMEK questions. As we're both not very familiar with gcp, I wonder if you could help us with answering the below:
- What's the flow for key rotations for anyone of the 2 settings? We're obviously very worried about user's data that becomes inaccessible due to a rotation taking place, but it's not just data. lakeFS stores immutable committed metadata in the object store using the same gs adapter we're modifying, so a rotation that goes wrong could fail lakeFS completely.
- Can the user set CSEK/CMEK after some time of using lakeFS? Will the previous saved objects be accessible.
- Would you say it's a lakeFS-level setting? From what we understand, it's common for different gs buckets to have different encryption keys. A single lakeFS installation can manage multiple repositories across different buckets. What's the additional value lakeFS give to the user? It seems like we're forcing a single encryption key for all the repos. Isn't the user better off setting this in the bucket level and keep lakeFS unaware to encryption (again - not sure if that's possible).
Really appreciate the review! It's my honor to have those critical questions being addressed.
-
In short, rotating key in KMS does not effect existing objects, so a. CSMK (Available only in bucket level)
- Using the version of the encrypted key to decrypt
- Updating the key to the latest version request a new copy of the object, to my understanding it seems the same as S3
b. CSEK (Available only in bucket level)
- Similar to CSMK, updating a key requires you to make a new copy
-
As explained above, anything happened in KMS level will be fine but if the key path or the key value must be changed, the only way is to copy to a object, then either it is not possible to be done with existing implementation. So for CMEK we can still have the auto and manual in KMS way, not possible for changing key path in CMEK(may still be possible as long as the permission and the default has been set) or key value in CSEK, the old object will not be available any more.
-
Yes and we also had thought about this before. There can be at least two perspectives of the encryption: Internal solution and Multi-Tenant.
- Internal service: All the users are internal and most like in the same team so the point will be to ensure the data is encrypted and only be granted for specific accounts
- Multi-tenant: Users are mostly from external and would like to manage their own encryption And TBH we are more like in a hybrid way. So ideally we would like to have different key setup for each repository.
The reason I am making it in lakeFS level is because there's already a S3 reference in the lakeFS level that I thought it will be easier for you to accept it so that we can meet the internal need first. Then I would like look for a possible solution as a multi-tenant proposal, and they should be able to exist in lakeFS just cannot be configured at the same time. But we all know that it will be more complicated, especially when we are providing an universal UI/API/CLI to the operation of different storage options, different keys will require not just the permission but also the key management in lakeFS itself or we may have issues in CSEK(ex. not able to preview in UI, similar reason I dropped the support of PreSignedURL when CSEK enabled) or the key of the bucket is not using the default key of the project.
And now I implementing CSEK is just providing an option, we will consider more on CMEK. The reason is that CMEK can actually be enabled almost without implementation in lakeFS as long as the key is the default key of the project and being proper authenticated. On the other hand, CSEK is more like a thing that user can have more control when there's risk happened. So the implementation is to consider the case we would like the CMEK being used as an internal protection for now, and I am happy to remove CSEK if you think it's risky.
@emulatorchen you would need to sign the CLA in order to contribute...
I think I just screwed up this PR when I tried to fix email on my commits, I will just create another PR for it.