dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Read permissions in data.all for share Requesters - multiple permission roots

Open dlpzx opened this issue 10 months ago • 2 comments

Is your idea related to a problem? Please describe. At the moment when a share request for tables or folders is approved, the requester group gets data.all application permissions to GET_TABLE, PREVIEW_TABLE AND GET_FOLDER respectively. These permissions allow the requester group to get the tables and folder details in the data.all UI.

The challenge is that there is one single permissions created for a group, but a group might be owning multiple share requests (in other environments or with consumption roles) on the same folder/table. Where data.all expects one permission-per share, multiple shares are sharing the same permissions, which can be problematic for clean-up purposes.

This issue is related to https://github.com/data-dot-all/dataall/pull/1163 and to https://github.com/data-dot-all/dataall/issues/1173

Describe the solution you'd like A better way to handle permissions derived from multiple sources

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

dlpzx avatar Apr 15 '24 07:04 dlpzx

1. Straight-forward solution: we can introduce a field in ResourcePolicy.shareUri

Pros: In this case we will be able to track group permissions for each share independently. Of course, we will have to backfill database with.

Cons: we will have as many permission entries in database for every (group, resource) pair, as many shares they involved in.

2. More complicated solution: In case of granting permissions via share, we can appoint permissions to share principals (e.g. consumptionRole, or environmental group) and then, when we want to check permissions for the group, we will check

  1. Which share principals this this group has access to?
  2. Is at least one of these principals has access to the object?

Pros: no extra storage Cons: more complicated logic

@dlpzx ?

SofiaSazonova avatar May 01 '24 15:05 SofiaSazonova

I am taking a step back, resource_policies are data.all application permissions that grant groups access to any resource with a unique identifier. Because of that I do not think the principal of a permission should be a consumption role (option2). This are permissions from groups to resources.

I would go for something closer to option 1 but being as generic as possible, we might get into this issue in the future for other shared resources or granted permissions with a new mechanism. Maybe something that indicates the reason/source/origin of the permissions. sourceResourceUri? Idk about the name, but something generic

class ResourcePolicy(Base):
    __tablename__ = 'resource_policy'
    sid = Column(String, primary_key=True, default=utils.uuid('resource_policy'))
    resourceUri = Column(String, nullable=False, index=True)
    resourceType = Column(String, nullable=False, index=True)
    principalId = Column(String, nullable=False, index=True)
    principalType = Column(DBEnum('USER', 'GROUP', 'SERVICE', name='rp_principal_type'), default='GROUP')
    permissions = relationship('ResourcePolicyPermission', uselist=True, backref='resource_policy')
...

dlpzx avatar May 02 '24 12:05 dlpzx

@dlpzx I had a closer look and

  1. Group gets no permissions, if the shareRequest is for ConsumptionRole
  2. Group cant have more then 1 share with principal type Goup for one dataset. If i first request table and then the whole dataset, we got the same share -- so, again, no multiple permissions
  3. Dataset can not belong to more then one Env inside Data.all. Even, if it's the same table in AWS Glue, in Data.all it will have different URIs -> different permission entries.

So, right now we can not have a described confusing situation. Of course, we can get it in future, but I guess it's better to deal with it then.

@noah-paige WDYT?

SofiaSazonova avatar May 03 '24 16:05 SofiaSazonova

I think assumption 1 is not true, in dataall.modules.dataset_sharing.services.share_object_service.ShareObjectService.attach_dataset_table_read_permission we are granting permissions not filtering by consumption role/group. If we should grant data.all permissions to the group that owns the CR might be arguable, on the one side, it would really facilitate our lifes if we don't; but without the permissions the users of the group that owns the CR cannot verify the data in data.all UI. From a UX point of view I would grant the permissions

dlpzx avatar May 06 '24 06:05 dlpzx

if i understand correctly - this issue describes the case where GroupA is a member of EnvA and EnvB and has access to TableX via share requestos from both GroupA in EnvA and GroupA in EnvB, correct? (+ extended for CR and so on)...

I think what may be the easiest way to handle this issue is to update the delete permissions logic to be the inverse of the attach logic.

  • Meaning in approve share object - we create/attach the resource policy + associated permissions if they do not already exist otherwise we just continue
  • For revoke share obj items - we can delete resource policy + associated permissions only if no other share objects exist for the respective requestor group, otherwise we just continue

In practice, updating the revoke_share permission deletion/management would require the following:

  • [Pt. 1] Clean up data_sharing_service.py
    • For approve_share - unused variable need_grant_permissions
    • For revoke_share change both conditions to delete permissions from:
if share.groupUri != dataset.SamlAdminGroupName and share.principalType == PrincipalType.Group.value:

To (or similar):

if share.groupUri != dataset.SamlAdminGroupName and share.groupUri != dataset.stewards:
  • [Pt. 2] Add a new method find_all_other_share_items (similar to find_all_share_items) to filter for all ShareObjectItems for a particular groupUri (a.k.a requestor) and itemUri (i.e all shares from GroupA to Table B - regardless of status or share)
    • Add logic on revokes to say if no other successful shares (w/ Share_Succeeded) then delete resource policy for the itemUri for said group (via delete_resource_policy in either delete_dataset_folder_read_permission() and delete_dataset_table_read_permission())

Alternative solutions:

  • Not sure if I completely follow the introduction of ResourcePolicy.sourceResourceUri but I am assuming that it is fair to say that each resource policy / permission grant can be with respect to a given group (i.e. requestor) to get access to a given resource (reosourceUri) in a particular environment (envUri)

    • In this particular example is sourceResourceUri equivalent to envUri but making it generic to extend to other use cases?
    • Note: would need some exceptions for permissions on resources with no parent/source (i.e. organizations or glossaries)
  • Introduce a field in ResourcePolicy.shareUri --> I am not sure I follow this idea of introducing shareUri column to ResourcePolicy table? Resource policies are not necessarily only about shares but can be about any data.all resource right? Maybe the idea though is similar to the above alternative?

noah-paige avatar May 06 '24 17:05 noah-paige

TLDR - I think the easiest approach would be to update the way we manage deletes to resolve this issue...

If we want to more cautious and "future proof" maybe adding columns to ResourcePolicy and backfilling could be beneficial but not sure it is a requirement as of now / may need more in-depth conversaiton.

noah-paige avatar May 06 '24 17:05 noah-paige

I lean towards solution Noah described (find other shares). I will implement it tomorrow and let's create another issue for the rest. Or may be just proceed with this one.

SofiaSazonova avatar May 06 '24 18:05 SofiaSazonova

@SofiaSazonova can we close this issue ?

anmolsgandhi avatar Jun 20 '24 02:06 anmolsgandhi

Completed as part of v2.5

anmolsgandhi avatar Jun 21 '24 12:06 anmolsgandhi