dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Support WRITE access for consumer roles

Open voidwisp opened this issue 1 year ago • 1 comments

Is your idea related to a problem? Please describe. Currently data.all only grants READ only access to consumer IAM roles. However organizations like ours need to manage WRITE access as well to define which roles can write to which S3 buckets or which databases. Otherwise we end up managing read only access with data.all and all write access outside of data.all. We would like to unify and manage both read and write access via data.all.

Describe the solution you'd like We need to think this through in context of:

  • S3 bucket sharing
  • Access points
  • LakeFormation
  • Incoming RedShift integration

S3 bucket sharing Overall a very simple change. The IAM role just needs to be granted PUTObject permissions + KMS Encrypt permissions on the key etc. I would not want to grant DELETE or anything else besides PUT? Perhaps this can be configurable on config what permissions to grant so organizations can decide. Those extra permissions be granted by the user themselves on the IAM role using IaC when needed.

Access points I don't know too much about these but I suspect they work similarly to S3 bucket sharing. Hope the team can clarify on this ticket.

LakeFormation The question is what permissions we should grant here. We could limit only to just basic to add new partitions (don't know which permission controls that atm) which is what most of the writing roles will ever need. Or we can also grant things like DROP, INSERT, DELETE, CREATE. I think to be most useful to everyone WRITE should grant full write access with all the other permissions that are not currently granted. Though this could be inconsistent with S3 permissions if for example we only grant PUT on S3 but grant DELETE on the DB? Perhaps this could also be configurable but the default should be consistent for all.

RedShift Don't know enough on this to comment.

My proposal would be that WRITE access is defined when creating a share. We don't want to do it when registering a consumer role because consumer role could also be used for both write access on same account and read access on other accounts (ex EMR role). We can either define WRITE access per share item or for the entire share. I think defining it for the entire SHARE makes sense. Default should be READ ONLY. Write access should only be selectable (and validated by backend) if the consumer role and dataset belong to the same environment. The only problem I see is currently share items let you select specific tables so we could grant write access to them. But how do we grant CREATE access on the DB? Do we just do it implicitly which can confuse the user.. What if he doesn't select any tables should we still grant WRITE access to the DB? The only way I can think of solving this is that there must be a new share item for the DATABASE itself.

We also must make sure the share validator / health checker is made aware of the new extra permissions.

voidwisp avatar Jun 14 '24 12:06 voidwisp

Hi - Thanks for opening the issue, it will be part of v2.7 release. cc: @dlpzx

anmolsgandhi avatar Jun 18 '24 14:06 anmolsgandhi

Hi @petrkalos, having a look at the docs, Redshift write permissions for datashares (which is what we are going to use) are currently in preview :https://docs.aws.amazon.com/redshift/latest/dg/multi-warehouse-writes-data-sharing.html

dlpzx avatar Jul 25 '24 07:07 dlpzx

Hi @zsaltys please take a look at the following video demonstrating the WRITE access workflow. As you can see I am initially trying to PutObject as a scientist and get permission denied as expected. I then request READ/WRITE/MODIFY access for the Scientist team and try again which succeeds.

Some commends wrt to your initial thoughts

  • I split the permissions to WRITE and MODIFY which will map to the corresponding rules depending on the share type (i.e for s3 WRITE will have PutObject and MODIFY will have DeleteObject etc). I prefer not to make this configurable as feature flags add complexity to the codebase.
  • consumer role and dataset belong to the same environment Since we are implementing this feature I'd prefer to make generally available and not have those restrictions that are not enforced by any tech limitations. At the end of the day a request for access is being made and approved.

https://github.com/user-attachments/assets/877c8496-5d33-4ee9-beea-063aa9ad17bc

petrkalos avatar Aug 08 '24 09:08 petrkalos

Closing as complete. To be released in 2.7

dlpzx avatar Sep 05 '24 12:09 dlpzx