datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(models) : Joins (Datasets) schema, resolvers and UI

Open poorvi767 opened this issue 1 year ago • 15 comments

Checklist

  • [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

poorvi767 avatar Jun 28 '23 15:06 poorvi767

We have closed previous PR: https://github.com/datahub-project/datahub/pull/7961 by @rtekal and created this PR with additional changes containing updated resolvers and UI.

poorvi767 avatar Jun 28 '23 15:06 poorvi767

Alright gang and I've done at least some manual testing after pulling your branch down! Things are looking swell. Here's a list of suggestions and things I noticed while playing around.

  • The “no join available” text looks too small and doesn’t have even padding on top and bottom. I would follow other examples we have where there’s no data yet like how it looks in the Queries tab.
image
  • (minor) You can create joins with the same table. We should probably filter itself out or throw an error on the BE when trying to create
  • Why do we execute a search request for joins when typing in the Join input for the Join name? (IN CODE)
  • Need to add this permission to Policies.json
  • If the user isn’t able to create joins, the whole screen refreshes with no error shown and simply no join results. We do see an error on the GMS side saying the user doesn’t have permission though
  • When going to the Relationships tab and there are no joins, it looks like some empty result flashes on the screen for a second and looks weird/confusing
  • If there are empty join fields, we could use a better error where the icon and error message are on the same line. Let’s change it to “Please fill out or remove all empty Join fields”. Also this could be a frontend check and not necessitate getting a response from the server (not super picky about this part)
image
  • After successfully creating or editing a join we refresh the whole page. We should simply refetch to get joins and not cause the whole page to reset.
  • the dataset name and “View dataset” should be aligned vertically
image
  • Why can’t you edit a join name once it’s created?
  • interesting that we have join details that you provide while creating the join as well as separate documentation on the join entity. do we need both of these? or should we just use the main documentation aspect?
  • I don’t see a way to delete a join in the UI
  • Why do we have “Joins” in blue link text above the join searcher? I feel like that can be removed. I think the top of this tab can/should look more like the Queries tab for consistency since they’re very similar types of components
  • (minor) if the Join details section is really long it will just keep going - we should implement a “Read More” type of situation here. You can use

chriscollins3456 avatar Aug 21 '23 17:08 chriscollins3456

Hi @chriscollins3456 Regarding Joins Tab under Relationships tab: Here Relationship is a broader perspective but Join is one of them there could be some other type of relations as well like PrimaryKey/FK relationships so currently we have started with Joins tab under Relationships. Thats why you can see Joins in Blue colour over Searchbox.

poorvi767 avatar Sep 06 '23 12:09 poorvi767

(minor) You can create joins with the same table. We should probably filter itself out or throw an error on the BE when trying to create

We did this intensionally, since we can have self-joins. https://www.w3schools.com/sql/sql_join_self.asp

rtekal avatar Sep 06 '23 16:09 rtekal

Why do we execute a search request for joins when typing in the Join input for the Join name? (IN CODE) Here we wanted to check that join name is unique while creating a join (technical name).

poorvi767 avatar Oct 03 '23 08:10 poorvi767

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2023 2:36pm

vercel[bot] avatar Oct 03 '23 14:10 vercel[bot]

@chriscollins3456 - Please review the PR and let us know. Thank you.

These are all the cases we verified.

Join Tests

JOIN_FEATURE_FLAG works
	when not present; the feature is not enabled
	when false; the feature is not enabled
	when true; the feature is enabled and Relationships tab is visible in the dataset

Permissions Validation
  Owner can create
   Somebody in the Policy can create
  Admin can create.

Create Join
	Who can create a join? - see permissions

	Search for Datasets to Join
	Select a second Dataset

	Validations
		Name should be present
		At least one pair should be present

	Cancel works
	Submit works

Update Join
	Cancel functionality works
	Owner can update
	Additional owners of a join can update
	Join name can be changed, Join URN does not change, Display name shows
	Description can be updated


View Join From Datasets
	Visible from either Dataset A or Dataset B
	Shows Join Name
	Shows Join Description
	
View Join Entity
	Same Display Name is visible
	Join Description shows in About
	Shows Join Fields
	Able to click Edit

After Create, validate View Join from Datasets and View Join Entity
After Update,  validate View Join from Datasets and View Join Entity
Navigation from and to Join/Datasets

Adding Glossary
Adding Links
Adding Tags

rtekal avatar Oct 06 '23 15:10 rtekal

@chriscollins3456 We can have a quick chat to review the current changes. I do not want this PR to become stale and get closed. Thank you.

rtekal avatar Nov 14 '23 22:11 rtekal

@chriscollins3456, Few days ago we had a meeting in which, @shirshanka recommended us to use 'erModelRelation' (ER ModelRelation) in place of 'Join', so that it is more meaningful for the general community.

I will do these changes, merge with the latest (since there are conflicts), do local testing and request PR review again. I don't think there will be any problems with the UI display of 'ER Model Relation', but will keep you posted. Thank you.

rtekal avatar Dec 14 '23 14:12 rtekal

Sorry, corrected.... ER Model Relation

rtekal avatar Dec 14 '23 18:12 rtekal

I thought I did all the replace, and while starting datahub I am seeing this error.

I will continue to check and once tested will request a review. Thanks.

Field _ermodelrelationService in com.linkedin.gms.factory.graphql.GraphQLEngineFactory required a bean of type 'com.linkedin.metadata.service.ERModelRelationService' that could not be found.

rtekal avatar Jan 18 '24 19:01 rtekal

I tested my changes after the change of name from 'Join' to 'ERModelRelation' Please review. Thanks.

rtekal avatar Jan 24 '24 16:01 rtekal

@rtekal / @chriscollins3456 -- jumping on late here, but looking to understand what are the outstanding items to get this PR to merge and complete this workstream? Thank you.

sgm44 avatar Jan 31 '24 14:01 sgm44

Ensured again that there are no conflicts with master. Tested the functionality with and without the feature flag: ERMODELRELATION_FEATURE_ENABLED

rtekal avatar Feb 09 '24 17:02 rtekal

Hi @chriscollins3456, @shirshanka, Could you please provide your inputs in case need to update anything to merge this PR? We have completed all the changes and tested well from our end. Thank you!

poorvi767 avatar Feb 16 '24 06:02 poorvi767

Synced with master.

rtekal avatar Mar 05 '24 16:03 rtekal

there's a few outstanding model changes I want to make sure we're on the same page with before merging. My biggest thinking is that we have these models in acryl (but were going to bring them to OSS anyways hence you guys taking that on here) and I don't want to have clashing between them.

the main differences i'm seeing in the PDL modeling are:

  • name of the new entity: ermodelrelation -> erModelRelationship
  • name of the main aspect: eRModelRelationProperties -> erModelRelationshipProperties
  • editableERModelRelationProperties -> editableERModelRelationshipProperties (just being consistent with "relationship" instead of "relation"
  • the fieldMappings piece: ermodelrelationFieldMapping: ERModelRelationFieldMapping -> relationshipFieldMappings: array[RelationshipFieldMapping] (I describe this change in the comments above)

then there are some differences in the graphql API

  • ermodelrelation(urn: String!): ERModelRelation -> erModelRelationship(urn: String!): ERModelRelationship`
  • field mappings under ERModelRelationshipProperties: ermodelrelationFieldMapping: ERModelRelationFieldMapping! -> relationshipFieldMappings: [RelationshipFieldMapping!] where we would then match the shape of the models with this field

I think once we get on the same page with model changes and API changes then we're good to merge this!

chriscollins3456 avatar Mar 12 '24 17:03 chriscollins3456

Yep Chris. Took care of all your review comments. Please let me know. Thanks.

rtekal avatar Mar 19 '24 12:03 rtekal

but it actually looks like you recently merged in master and the CI is looking solid! so really we should only have to worry about the casing request in that final comment and then we're merging this bad boy

I went ahead and merged in master one more time to make sure we're up to date

chriscollins3456 avatar Mar 19 '24 14:03 chriscollins3456

Sure Chris. Thank you. I went ahead and did the last review changes. If you find issues with 'tests and linting' please let me know. Tomorrow I will ask Poorvi to add you as collaborator, in case you commit any changes to accelerate things faster.

Thank you again for all your help and extreme patience with our contribution.

rtekal avatar Mar 19 '24 19:03 rtekal