dataall
dataall copied to clipboard
Read permissions in data.all for share Requesters - multiple permission roots
Is your idea related to a problem? Please describe. At the moment when a share request for tables or folders is approved, the requester group gets data.all application permissions to GET_TABLE, PREVIEW_TABLE AND GET_FOLDER respectively. These permissions allow the requester group to get the tables and folder details in the data.all UI.
The challenge is that there is one single permissions created for a group, but a group might be owning multiple share requests (in other environments or with consumption roles) on the same folder/table. Where data.all expects one permission-per share, multiple shares are sharing the same permissions, which can be problematic for clean-up purposes.
This issue is related to https://github.com/data-dot-all/dataall/pull/1163 and to https://github.com/data-dot-all/dataall/issues/1173
Describe the solution you'd like A better way to handle permissions derived from multiple sources
P.S. Don't attach files. Please, prefer add code snippets directly in the message body.
1. Straight-forward solution: we can introduce a field in ResourcePolicy.shareUri
Pros: In this case we will be able to track group permissions for each share independently. Of course, we will have to backfill database with.
Cons: we will have as many permission entries in database for every (group, resource) pair, as many shares they involved in.
2. More complicated solution: In case of granting permissions via share, we can appoint permissions to share principals (e.g. consumptionRole, or environmental group) and then, when we want to check permissions for the group, we will check
- Which share principals this this group has access to?
- Is at least one of these principals has access to the object?
Pros: no extra storage Cons: more complicated logic
@dlpzx ?
I am taking a step back, resource_policies are data.all application permissions that grant groups access to any resource with a unique identifier. Because of that I do not think the principal of a permission should be a consumption role (option2). This are permissions from groups to resources.
I would go for something closer to option 1 but being as generic as possible, we might get into this issue in the future for other shared resources or granted permissions with a new mechanism. Maybe something that indicates the reason/source/origin of the permissions. sourceResourceUri
? Idk about the name, but something generic
class ResourcePolicy(Base):
__tablename__ = 'resource_policy'
sid = Column(String, primary_key=True, default=utils.uuid('resource_policy'))
resourceUri = Column(String, nullable=False, index=True)
resourceType = Column(String, nullable=False, index=True)
principalId = Column(String, nullable=False, index=True)
principalType = Column(DBEnum('USER', 'GROUP', 'SERVICE', name='rp_principal_type'), default='GROUP')
permissions = relationship('ResourcePolicyPermission', uselist=True, backref='resource_policy')
...
@dlpzx I had a closer look and
- Group gets no permissions, if the shareRequest is for ConsumptionRole
- Group cant have more then 1 share with principal type Goup for one dataset. If i first request table and then the whole dataset, we got the same share -- so, again, no multiple permissions
- Dataset can not belong to more then one Env inside Data.all. Even, if it's the same table in AWS Glue, in Data.all it will have different URIs -> different permission entries.
So, right now we can not have a described confusing situation. Of course, we can get it in future, but I guess it's better to deal with it then.
@noah-paige WDYT?
I think assumption 1 is not true, in dataall.modules.dataset_sharing.services.share_object_service.ShareObjectService.attach_dataset_table_read_permission
we are granting permissions not filtering by consumption role/group. If we should grant data.all permissions to the group that owns the CR might be arguable, on the one side, it would really facilitate our lifes if we don't; but without the permissions the users of the group that owns the CR cannot verify the data in data.all UI. From a UX point of view I would grant the permissions
if i understand correctly - this issue describes the case where GroupA is a member of EnvA and EnvB and has access to TableX via share requestos from both GroupA in EnvA and GroupA in EnvB, correct? (+ extended for CR and so on)...
I think what may be the easiest way to handle this issue is to update the delete permissions logic to be the inverse of the attach logic.
- Meaning in approve share object - we create/attach the resource policy + associated permissions if they do not already exist otherwise we just continue
- For revoke share obj items - we can delete resource policy + associated permissions only if no other share objects exist for the respective requestor group, otherwise we just continue
In practice, updating the revoke_share
permission deletion/management would require the following:
- [Pt. 1] Clean up
data_sharing_service.py
- For
approve_share
- unused variableneed_grant_permissions
- For
revoke_share
change both conditions to delete permissions from:
- For
if share.groupUri != dataset.SamlAdminGroupName and share.principalType == PrincipalType.Group.value:
To (or similar):
if share.groupUri != dataset.SamlAdminGroupName and share.groupUri != dataset.stewards:
- [Pt. 2] Add a new method
find_all_other_share_items
(similar tofind_all_share_items
) to filter for all ShareObjectItems for a particular groupUri (a.k.a requestor) and itemUri (i.e all shares from GroupA to Table B - regardless of status or share)- Add logic on revokes to say if no other successful shares (w/ Share_Succeeded) then delete resource policy for the itemUri for said group (via
delete_resource_policy
in eitherdelete_dataset_folder_read_permission()
anddelete_dataset_table_read_permission()
)
- Add logic on revokes to say if no other successful shares (w/ Share_Succeeded) then delete resource policy for the itemUri for said group (via
Alternative solutions:
-
Not sure if I completely follow the introduction of
ResourcePolicy.sourceResourceUri
but I am assuming that it is fair to say that each resource policy / permission grant can be with respect to a given group (i.e. requestor) to get access to a given resource (reosourceUri) in a particular environment (envUri)- In this particular example is
sourceResourceUri
equivalent toenvUri
but making it generic to extend to other use cases? - Note: would need some exceptions for permissions on resources with no parent/source (i.e. organizations or glossaries)
- In this particular example is
-
Introduce a field in
ResourcePolicy.shareUri
--> I am not sure I follow this idea of introducingshareUri
column to ResourcePolicy table? Resource policies are not necessarily only about shares but can be about any data.all resource right? Maybe the idea though is similar to the above alternative?
TLDR - I think the easiest approach would be to update the way we manage deletes to resolve this issue...
If we want to more cautious and "future proof" maybe adding columns to ResourcePolicy and backfilling could be beneficial but not sure it is a requirement as of now / may need more in-depth conversaiton.
I lean towards solution Noah described (find other shares). I will implement it tomorrow and let's create another issue for the rest. Or may be just proceed with this one.
@SofiaSazonova can we close this issue ?
Completed as part of v2.5