feat(accessRequests)
Adds accessRequest object to DataHub as well as graphQL resolvers for adding an accessRequest to Datasets, Containers and Data Products.
This is as described here: https://datahubspace.slack.com/archives/C017W0NTZHR/p1706108642104539
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [x] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Awesome PR here, this looks really good!!
There are a couple missing pieces needed I think to round out the feature.
- Remove access request - For example, there doesn't appear to be a way to remove an access request from an entity. I can see this being needed when either the access request has been granted or in case the requestor decides that they really don't need access after all. Probably a permissions check to validate that the user is removing their own access request or a user with a specific privilege.
- Arrays of access requests like this need the ability to be patched. A patch template and builder for this specific array allows sending a json patch request which can selectively add/remove objects from the array in an efficient manner. Examples of these can be found here. Implementing a patch builder would then be used in the remote resolver to selectively remove the access request without reading the entire array and performing list operations.
@matthew-coudert-cko @david-leifker @shirshanka a few thoughts while working on the remove resolver...
An Access Request in DataHub represents the logical entity of an Access Request, and the access request is a unit of work to be actioned in another system.
Considering that, there are attributes that would make sense:
- Has it been actioned? (
booleanor['NOT_STARTED', 'IN_PROGRESS', 'COMPLETED']) - Has the request been approved? (
boolean) - Is it on behalf of the actor or someone else?
What do you think about...
- Including these fields?
- Include a unique generated ID per
accessRequest? The current method for removing anaccessRequesthas to useactorandtimefromaccessRequest.created (AuditStamp)as the unique identifier.
I'm tempted to suggest AccessRequest as its own entity, where the accessRequests aspect is a relationship. In large organizations, Data Governance functions may want to search, audit, and browse access requests.
It would also make extending this capability in future easier (acknowledging data usage agreements, data contracts, etc) and give us a unique identifier.
@stevenayers - completely agree that AccessRequest as a separate entity will be better for the long term.
@stevenayers - completely agree that AccessRequest as a separate entity will be better for the long term.
Hello @shirshanka! Your last comment resonates very well with what we have done. 😊
We have a highly thoughtful and complete model design for data-access-management and have already built a promising solution on top of it in our organization. Now we are happy to contribute this design. Please feel free to have a look in our demo PR #10061. Our team are happy to share more insights and details.
We are also open for a demo meeting if you would like to see what our solution looks like, in terms of architecture, workflow and UI/UX.
(Also FYI @david-leifker @stevenayers if you are interested.)
Hi @matthew-coudert-cko. Didn't mean to steal this PR for a competing contribution. But since the Acryl team prefers a separate entity for the feature, please feel free to also check our PR #10061 and see if it also addresses your requirement. We'll be also very happy if you are willing to share insights from your side so we together with the community can make a nice model and bring values to the community. 😊
@pedro93 - pointed out an earlier set of PRs which actually somewhat duplicates this PR. It seems like there is an access aspect which was only added to dataset entity and some UI work as well. This earlier work doesn't track the status but does allow for a link to an external url to be able to request access which is then handled outside of DataHub.
https://github.com/datahub-project/datahub/pull/8267 https://github.com/datahub-project/datahub/pull/8541
Closing this PR due to preference for the implementation in this #10061. Thank you for the PR and happy to work to address any limitations of the mentioned PR.