datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Datahub UI: MLModel neither shows Features nor mlModelTrainingData datasets

Open jazzblue opened this issue 2 years ago • 9 comments

Describe the bug After ingesting (via Rest emitter) features, datasets and model that has those features and datasets specified in mlModelProperties/mlFeatures and mlModelTrainingData/trainingData respectively, the UI, under the model shows neither features (under Features tab it shows "No Data") nor mlModelTrainingData/trainingData/datasets under Summary tab.

To Reproduce Steps to reproduce the behavior:

  1. Follow these instructions to spin up Datahub locally (quickstart) and install Python dependencies.
  2. Run the following Python code to ingest dataset, feature and model that has that feature and dataset specified in mlModelProperties/mlFeatures and mlModelTrainingData/trainingData respectively:
import datahub.metadata.schema_classes as models
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter


rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")

dataset_urn = builder.make_dataset_urn(
    platform="myplatform", name="my-dataset", env="PROD"
)
feature_urn = builder.make_ml_feature_urn(
    feature_table_name="my-feature-table", feature_name="my-feature"
)
model_urn = builder.make_ml_model_urn(
    platform="myplatform", model_name="mymodel", env="PROD"
)

# Create dataset
metadata_change_proposal = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType=models.ChangeTypeClass.UPSERT,
    entityUrn=dataset_urn,
    aspectName="datasetProperties",
    aspect=models.DatasetPropertiesClass(description="my dataset"),
)
rest_emitter.emit_mcp(metadata_change_proposal)

# Create feature
metadata_change_proposal = MetadataChangeProposalWrapper(
    entityType="mlFeature",
    changeType=models.ChangeTypeClass.UPSERT,
    entityUrn=feature_urn,
    aspectName="mlFeatureProperties",
    aspect=models.MLFeaturePropertiesClass(
        description="my feature",
        sources=[dataset_urn],
    ),
)
rest_emitter.emit_mcp(metadata_change_proposal)

# Create model
metadata_change_proposal = MetadataChangeProposalWrapper(
    entityType="mlModel",
    changeType=models.ChangeTypeClass.UPSERT,
    entityUrn=model_urn,
    aspectName="mlModelProperties",
    aspect=models.MLModelPropertiesClass(
        description="My model",
        mlFeatures=[feature_urn],
    ),
)
rest_emitter.emit_mcp(metadata_change_proposal)

metadata_change_proposal = MetadataChangeProposalWrapper(
    entityType="mlModel",
    changeType=models.ChangeTypeClass.UPSERT,
    entityUrn=model_urn,
    aspectName="mlModelTrainingData",
    aspect=models.TrainingDataClass(
        trainingData=[models.BaseDataClass(dataset=dataset_urn)]
    ),
)
rest_emitter.emit_mcp(metadata_change_proposal)
  1. In the UI search for the ingested model and navigate to it.
  2. Click on Summary.
  3. See there is no training datasets. Expected: datasets stored under mlModelTrainingData/trainingData.

Screenshots Screen Shot 2022-08-03 at 1 52 17 PM

Desktop (please complete the following information):

  • OS: MacOS

jazzblue avatar Aug 02 '22 18:08 jazzblue

Hi @jazzblue,

Do you mind pasting your emitter snippets? This will help us debug your situation further.

Thanks John

jjoyce0510 avatar Aug 02 '22 21:08 jjoyce0510

Hi @jazzblue,

Do you mind pasting your emitter snippets? This will help us debug your situation further.

Thanks John

Hi @jjoyce0510, @gabe-lyons, Thanks for the prompt response. I have pasted the Python code snippet in the description above. Best regards, Greg

jazzblue avatar Aug 03 '22 17:08 jazzblue

@jazzblue One thing I notice immediately is that the entity types are incorrect for the following entities in your code:

  1. ML Feature
  2. ML Model

It seems to be that the final 3 aspects should have failed to be ingested by DataHub (since the entity type is malformed). Try to use the following as the entity type string:

mlfeature -> mlFeature mlmodel -> mlModel

Thanks

John

jjoyce0510 avatar Aug 15 '22 22:08 jjoyce0510

For future reference, you can find all "official" DataHub entity type names here: https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/resources/entity-registry.yml

jjoyce0510 avatar Aug 15 '22 22:08 jjoyce0510

For future reference, you can find all "official" DataHub entity type names here: https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/resources/entity-registry.yml

Hi @jjoyce0510, Thanks for the entity type names reference. I have made the change you suggested, but it did not change the result. Could you, maybe, run yourself and check?

Thanks! Greg

jazzblue avatar Aug 19 '22 00:08 jazzblue

I wonder if there has been any development here?

jazzblue avatar Sep 21 '22 20:09 jazzblue

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Oct 22 '22 02:10 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This seems like unattended. The issue still appears in the latest version (v0.9.0): There is no lineage/traceability of training dataset from the ML model in UI. Could anyone take a look at it? @jjoyce0510

jazzblue avatar Oct 28 '22 21:10 jazzblue

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Dec 02 '22 02:12 github-actions[bot]

@jazzblue would you mind showing us what these entities/aspects look like in your MySQL database?

aditya-radhakrishnan avatar Dec 20 '22 17:12 aditya-radhakrishnan

@jazzblue would you mind showing us what these entities/aspects look like in your MySQL database?

@aditya-radhakrishnan Thanks for responding. This issue has been open for quite long. Above, long ago I was asked by @jjoyce0510 to provide a snippet of code for reproducing the issue which I did, please, see above. If you would like to help, you might want to spin up datahub in quickstart mode and run that code and you can check any backend data you need. Thanks!

jazzblue avatar Dec 20 '22 22:12 jazzblue

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Jan 20 '23 02:01 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Feb 19 '23 02:02 github-actions[bot]

Is there any update on this? @jazzblue Did you manage to find a way to get this to work?

blaze225 avatar Oct 24 '23 07:10 blaze225