datahub
datahub copied to clipboard
feat(models) : Joins (Datasets) schema, resolvers and UI
Checklist
- [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
We have closed previous PR: https://github.com/datahub-project/datahub/pull/7961 by @rtekal and created this PR with additional changes containing updated resolvers and UI.
Alright gang and I've done at least some manual testing after pulling your branch down! Things are looking swell. Here's a list of suggestions and things I noticed while playing around.
- The “no join available” text looks too small and doesn’t have even padding on top and bottom. I would follow other examples we have where there’s no data yet like how it looks in the Queries tab.
- (minor) You can create joins with the same table. We should probably filter itself out or throw an error on the BE when trying to create
- Why do we execute a search request for joins when typing in the Join input for the Join name? (IN CODE)
- Need to add this permission to Policies.json
- If the user isn’t able to create joins, the whole screen refreshes with no error shown and simply no join results. We do see an error on the GMS side saying the user doesn’t have permission though
- When going to the Relationships tab and there are no joins, it looks like some empty result flashes on the screen for a second and looks weird/confusing
- If there are empty join fields, we could use a better error where the icon and error message are on the same line. Let’s change it to “Please fill out or remove all empty Join fields”. Also this could be a frontend check and not necessitate getting a response from the server (not super picky about this part)
- After successfully creating or editing a join we refresh the whole page. We should simply refetch to get joins and not cause the whole page to reset.
- the dataset name and “View dataset” should be aligned vertically
- Why can’t you edit a join name once it’s created?
- interesting that we have join details that you provide while creating the join as well as separate documentation on the join entity. do we need both of these? or should we just use the main documentation aspect?
- I don’t see a way to delete a join in the UI
- Why do we have “Joins” in blue link text above the join searcher? I feel like that can be removed. I think the top of this tab can/should look more like the Queries tab for consistency since they’re very similar types of components
- (minor) if the Join details section is really long it will just keep going - we should implement a “Read More” type of situation here. You can use
Hi @chriscollins3456 Regarding Joins Tab under Relationships tab: Here Relationship is a broader perspective but Join is one of them there could be some other type of relations as well like PrimaryKey/FK relationships so currently we have started with Joins tab under Relationships. Thats why you can see Joins in Blue colour over Searchbox.
(minor) You can create joins with the same table. We should probably filter itself out or throw an error on the BE when trying to create
We did this intensionally, since we can have self-joins. https://www.w3schools.com/sql/sql_join_self.asp
Why do we execute a search request for joins when typing in the Join input for the Join name? (IN CODE) Here we wanted to check that join name is unique while creating a join (technical name).
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
docs-website | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Oct 3, 2023 2:36pm |
@chriscollins3456 - Please review the PR and let us know. Thank you.
These are all the cases we verified.
Join Tests
JOIN_FEATURE_FLAG works
when not present; the feature is not enabled
when false; the feature is not enabled
when true; the feature is enabled and Relationships tab is visible in the dataset
Permissions Validation
Owner can create
Somebody in the Policy can create
Admin can create.
Create Join
Who can create a join? - see permissions
Search for Datasets to Join
Select a second Dataset
Validations
Name should be present
At least one pair should be present
Cancel works
Submit works
Update Join
Cancel functionality works
Owner can update
Additional owners of a join can update
Join name can be changed, Join URN does not change, Display name shows
Description can be updated
View Join From Datasets
Visible from either Dataset A or Dataset B
Shows Join Name
Shows Join Description
View Join Entity
Same Display Name is visible
Join Description shows in About
Shows Join Fields
Able to click Edit
After Create, validate View Join from Datasets and View Join Entity
After Update, validate View Join from Datasets and View Join Entity
Navigation from and to Join/Datasets
Adding Glossary
Adding Links
Adding Tags
@chriscollins3456 We can have a quick chat to review the current changes. I do not want this PR to become stale and get closed. Thank you.
@chriscollins3456, Few days ago we had a meeting in which, @shirshanka recommended us to use 'erModelRelation' (ER ModelRelation) in place of 'Join', so that it is more meaningful for the general community.
I will do these changes, merge with the latest (since there are conflicts), do local testing and request PR review again. I don't think there will be any problems with the UI display of 'ER Model Relation', but will keep you posted. Thank you.
Sorry, corrected.... ER Model Relation
I thought I did all the replace, and while starting datahub I am seeing this error.
I will continue to check and once tested will request a review. Thanks.
Field _ermodelrelationService in com.linkedin.gms.factory.graphql.GraphQLEngineFactory required a bean of type 'com.linkedin.metadata.service.ERModelRelationService' that could not be found.
I tested my changes after the change of name from 'Join' to 'ERModelRelation' Please review. Thanks.
@rtekal / @chriscollins3456 -- jumping on late here, but looking to understand what are the outstanding items to get this PR to merge and complete this workstream? Thank you.
Ensured again that there are no conflicts with master. Tested the functionality with and without the feature flag: ERMODELRELATION_FEATURE_ENABLED
Hi @chriscollins3456, @shirshanka, Could you please provide your inputs in case need to update anything to merge this PR? We have completed all the changes and tested well from our end. Thank you!
Synced with master.
there's a few outstanding model changes I want to make sure we're on the same page with before merging. My biggest thinking is that we have these models in acryl (but were going to bring them to OSS anyways hence you guys taking that on here) and I don't want to have clashing between them.
the main differences i'm seeing in the PDL modeling are:
- name of the new entity:
ermodelrelation
->erModelRelationship
- name of the main aspect:
eRModelRelationProperties
->erModelRelationshipProperties
-
editableERModelRelationProperties
->editableERModelRelationshipProperties
(just being consistent with "relationship" instead of "relation" - the fieldMappings piece:
ermodelrelationFieldMapping: ERModelRelationFieldMapping
->relationshipFieldMappings: array[RelationshipFieldMapping]
(I describe this change in the comments above)
then there are some differences in the graphql API
-
ermodelrelation(urn: String!): ERModelRelation
-> erModelRelationship(urn: String!): ERModelRelationship` - field mappings under
ERModelRelationshipProperties
:ermodelrelationFieldMapping: ERModelRelationFieldMapping!
->relationshipFieldMappings: [RelationshipFieldMapping!]
where we would then match the shape of the models with this field
I think once we get on the same page with model changes and API changes then we're good to merge this!
Yep Chris. Took care of all your review comments. Please let me know. Thanks.
but it actually looks like you recently merged in master and the CI is looking solid! so really we should only have to worry about the casing request in that final comment and then we're merging this bad boy
I went ahead and merged in master one more time to make sure we're up to date
Sure Chris. Thank you. I went ahead and did the last review changes. If you find issues with 'tests and linting' please let me know. Tomorrow I will ask Poorvi to add you as collaborator, in case you commit any changes to accelerate things faster.
Thank you again for all your help and extreme patience with our contribution.