datahub Multiple Results of particular Dataset are missing with the information of schema and Platform Instance in UI

Describe the bug While searching for particular dataset in dialogue box "Manage Upstream Lineage", results are not differentiated on the basis of Schema and Platform Instance, which makes it impossible to select the desired dataset.

To Reproduce

Navigate to dataset, for which upstream or downstream lineage needs to be added
Click Lineage Tab
Click Edit, "Manage Upstream Lineage" dialogue box open
select the dataset which has multiple instances in different schemas or Platform instance
All the datasets fetched without schema and Platform Instance information

Expected behavior Datasets should be fetched with Schema and Platform Instance information

Screenshots

Datahub Version 14.1

May 22 '25 12:05 deepgarg760

This is possible in the V2 UI on DataHub 1.0. We know it's a big switch, but we recommend swapping over to the new UI as it is being actively developed.

Jun 07 '25 20:06 asikowitz

Thanks for the update @asikowitz

Jun 09 '25 05:06 deepgarg760

I am trying to reproduce the issue with the current UI. However, it appears that you need data sources to reproduce it. The initial Docker setup does not include the data sources. What is the easiest way to add the appropriate data sources to reproduce the issue? And once the data sources are added, is it possible to have a direct link to the bug to try to reproduce it?

Jun 19 '25 06:06 MaciekRakowski

You can ingest some sample data by running python -m datahub docker ingest-sample-data. If you are running datahub locally with authentication on (default), you'll have to generate a token and then specify it python -m datahub docker ingest-sample-data --token <token>.

To reproduce the bug, make sure you're on the V1 UI (may have to go to settings -> appearance -> unselect "Try DataHub 2.0 (beta)"), then go to any dataset entity (e.g. http://localhost:9002/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)/Columns), go to the lineage tab, and click edit upstream or downstream). To fully reproduce, you'll need to add data that has a platform instance which is missing from our sample data. But if you're able to get this to display the database / schema, think that is good enough.

Jun 20 '25 18:06 asikowitz

How do I generate a token? When I go to the tokens page, it shows this below. Is there anything I should do to be able to create a token? Or do I generate it differently?

Jun 21 '25 23:06 MaciekRakowski

Ah if it's disabled then you don't need a token at all, you can just run python -m datahub docker ingest-sample-data

Jun 23 '25 23:06 asikowitz

It looks like my issue with not seeing the same screen as shown in the reproduction steps was the default UI version of 2.0. I ran the data ingestion and changed it to 1.0 and am able to see the screen. Can I know which specific dataset I can choose from the dropdown to reproduce the issue? On step 4 it says to choose a dataset that "has multiple instances in different schemas or Platform instance".

Jun 24 '25 06:06 MaciekRakowski

There isn't an issue with a specific dataset. Rather, the complaint here is that entities with the same platform, entity type, and entity name are indistinguishable, and the request is to display extra information: platform instance and database / schema. We don't have such entities in the seed data, but you can still work on displaying this extra information in two places: (i) on the list of related entities and (ii) on the search cards when searching to add more related entities. We should display this like we do on the entity header, i.e. in your screenshot, towards the top left, we describe the entity as: "Dataset | Hive > datahub_db > datahub_schema".

Jun 24 '25 16:06 asikowitz

I created a fix for this issue and have a PR for it. I tested it locally and it seems to work.

Here it is: https://github.com/datahub-project/datahub/pull/13856

Feel free to leave any comments.

Jun 25 '25 06:06 MaciekRakowski

Some of the checks are failing, including lint. This time, it does not give a specific lint error. I ran lint locally on the file I changed and I got no errors. I'm not sure how to fix the pipeline errors.

Jun 25 '25 07:06 MaciekRakowski

Hi @asikowitz ,

I submitted a PR a few days ago. It shows that the checks do not pass, but when I look closer, the one that is failing is the deployment task. The specific error shows this:

https://github.com/datahub-project/datahub/actions/runs/15961135460/job/45013841641?pr=13856

Run cloudflare/pages-action@1
  with:
    projectName: datahub-project-web-react
    workingDirectory: datahub-web-react
    directory: dist
    gitHubToken: ***
    wranglerVersion: [2](https://github.com/datahub-project/datahub/actions/runs/15961135460/job/45013841641?pr=13856#step:6:2)
  env:
    JAVA_HOME: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.15-6/x6[4](https://github.com/datahub-project/datahub/actions/runs/15961135460/job/45013841641?pr=13856#step:6:4)
    JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.1[5](https://github.com/datahub-project/datahub/actions/runs/15961135460/job/45013841641?pr=13856#step:6:5)-6/x64
    GRADLE_BUILD_ACTION_SETUP_COMPLETED: true
    GRADLE_BUILD_ACTION_CACHE_RESTORED: true
    DEVELOCITY_INJECTION_INIT_SCRIPT_NAME: gradle-actions.inject-develocity.init.gradle
    DEVELOCITY_AUTO_INJECTION_CUSTOM_VALUE: gradle-actions
    GITHUB_DEPENDENCY_GRAPH_ENABLED: false
Error: Input required and not supplied: apiToken

However, all areas within my control, such as linting, unit tests, and unit test coverage, pass. I believe the deployment issue is outside of my control. It may be because I'm creating a PR from my forked branch.

Are you or someone from your team able to review my PR? https://github.com/datahub-project/datahub/pull/13856

Jun 30 '25 01:06 MaciekRakowski