datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Neo4j getLineage with multi-hop support

Open mattmatravers opened this issue 2 years ago • 0 comments

Describe the bug DataHub is not fully integrated with Neo4j as it does not support multi-hop queries.

We need an implementation of getLineage for Neo4jGraphService, and then we can also set supportsMultiHop() to return true.

To Reproduce

  1. Set up an instance of DataHub with Neo4j as the Graph Index implementation
  2. Ingest some Datasets with lineage to each other.
  3. Open the DataHub UI and navigate to a dataset with some lineage.
  4. Go to the Impact tab (if using Neo4j this will be disabled).

Expected behavior The expectation is the Lineage tab and "Impact Analysis" should be functional once the Neo4j multi-hop integration is complete.

The following code needs to be implemented or amended in https://github.com/datahub-project/datahub/blob/master/metadata-io/src/main/java/com/linkedin/metadata/graph/neo4j/Neo4jGraphService.java:

The getLineage method needs to be implemented for multiple hops in Neo4jGraphService: https://github.com/datahub-project/datahub/blob/2dfc166bbde4f5ae6de04482fd1fcca9dcc3cabc/metadata-io/src/main/java/com/linkedin/metadata/graph/GraphService.java#L94

The supportsMultiHop method should return true for Neo4j: https://github.com/datahub-project/datahub/blob/2dfc166bbde4f5ae6de04482fd1fcca9dcc3cabc/metadata-io/src/main/java/com/linkedin/metadata/graph/GraphService.java#L188

Screenshots Example Lineage tab: https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:looker,long_tail_companions.view.instruction_set,PROD)/Lineage?filter_degree=1&page=1

mattmatravers avatar Sep 20 '22 10:09 mattmatravers