yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

MLFlow.data DataSource not stored correctly into MLFlow for GCS

Open lvijnck opened this issue 5 months ago • 1 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

  • Client: 1.15.0
  • Tracking server: latest

System information

Docker/Mac/PythonWe're using

Describe the problem

I'm trying to use MLFlow to log my data, while pointing to the data stored in GCS. This seems to work fine, and when printing the datasource of my data this works.

ds = mlflow.data.from_pandas(data, name="foo", source="gs://path_to_foo"))
mlflow.log_input(ds, context=self._context)

When printing the type of type(ds.source), I get GCSArtifactDatasetSource.

When I now try to load the dataset through the MLFlow client, i states it's a local dataset, i.e.mmlflow.source.type": "LOCAL"/

from mlflow.tracking import MlflowClient

client: MlflowClient = MlflowClient(tracking_uri="http://localhost:5001")

run = client.get_run("run-id")
ds = run.inputs.dataset_inputs
print(ds)

I would expect this to be of the GCS type so I can load it from the client.

Any inputs?

Tracking information

(Pdb) print("MLflow version:", mlflow.__version__)
MLflow version: 2.15.0
(Pdb) print("Tracking URI:", mlflow.get_tracking_uri())
Tracking URI: http://127.0.0.1:5001
(Pdb) print("Artifact URI:", mlflow.get_artifact_uri())
Artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
(Pdb) 
Artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
(Pdb) # MLflow >= 2.0
*** SyntaxError: invalid syntax
(Pdb) mlflow.doctor()
System information: Darwin Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:25 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6030
Python version: 3.11.9
MLflow version: 2.15.0
MLflow module location: /Users/laurens/Documents/projects/matrix/pipelines/matrix/.venv/lib/python3.11/site-packages/mlflow/__init__.py
Tracking URI: http://127.0.0.1:5001
Registry URI: http://127.0.0.1:5001
Active experiment ID: 0
Active run ID: 27655bc8d24f42c1b4a806ac68bee870
Active run artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
MLflow dependencies: 
  Flask: 3.0.3
  Jinja2: 3.1.4
  aiohttp: 3.10.5
  alembic: 1.13.2
  botocore: 1.34.162
  docker: 7.1.0
  fastapi: 0.112.1
  google-cloud-storage: 2.18.2
  graphene: 3.3
  gunicorn: 22.0.0
  markdown: 3.7
  matplotlib: 3.9.2
  mlflow-skinny: 2.15.0
  numpy: 1.23.5
  pandas: 1.5.3
  pyarrow: 15.0.2
  pydantic: 2.8.2
  querystring-parser: 1.2.4
  scikit-learn: 1.4.0
  scipy: 1.14.1
  sqlalchemy: 2.0.32
  uvicorn: 0.30.6
  virtualenv: 20.26.3

Code to reproduce issue

REPLACE_ME

Stack trace

REPLACE_ME

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

  • [X] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [X] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

lvijnck avatar Aug 28 '24 11:08 lvijnck