yocto-gl
yocto-gl copied to clipboard
MLFlow.data DataSource not stored correctly into MLFlow for GCS
Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the issues policy
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
- Client: 1.15.0
- Tracking server: latest
System information
Docker/Mac/PythonWe're using
Describe the problem
I'm trying to use MLFlow to log my data, while pointing to the data stored in GCS. This seems to work fine, and when printing the datasource of my data this works.
ds = mlflow.data.from_pandas(data, name="foo", source="gs://path_to_foo"))
mlflow.log_input(ds, context=self._context)
When printing the type of type(ds.source)
, I get GCSArtifactDatasetSource
.
When I now try to load the dataset through the MLFlow client, i states it's a local dataset, i.e.mmlflow.source.type": "LOCAL"
/
from mlflow.tracking import MlflowClient
client: MlflowClient = MlflowClient(tracking_uri="http://localhost:5001")
run = client.get_run("run-id")
ds = run.inputs.dataset_inputs
print(ds)
I would expect this to be of the GCS type so I can load it from the client.
Any inputs?
Tracking information
(Pdb) print("MLflow version:", mlflow.__version__)
MLflow version: 2.15.0
(Pdb) print("Tracking URI:", mlflow.get_tracking_uri())
Tracking URI: http://127.0.0.1:5001
(Pdb) print("Artifact URI:", mlflow.get_artifact_uri())
Artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
(Pdb)
Artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
(Pdb) # MLflow >= 2.0
*** SyntaxError: invalid syntax
(Pdb) mlflow.doctor()
System information: Darwin Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:25 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6030
Python version: 3.11.9
MLflow version: 2.15.0
MLflow module location: /Users/laurens/Documents/projects/matrix/pipelines/matrix/.venv/lib/python3.11/site-packages/mlflow/__init__.py
Tracking URI: http://127.0.0.1:5001
Registry URI: http://127.0.0.1:5001
Active experiment ID: 0
Active run ID: 27655bc8d24f42c1b4a806ac68bee870
Active run artifact URI: mlflow-artifacts:/0/27655bc8d24f42c1b4a806ac68bee870/artifacts
MLflow dependencies:
Flask: 3.0.3
Jinja2: 3.1.4
aiohttp: 3.10.5
alembic: 1.13.2
botocore: 1.34.162
docker: 7.1.0
fastapi: 0.112.1
google-cloud-storage: 2.18.2
graphene: 3.3
gunicorn: 22.0.0
markdown: 3.7
matplotlib: 3.9.2
mlflow-skinny: 2.15.0
numpy: 1.23.5
pandas: 1.5.3
pyarrow: 15.0.2
pydantic: 2.8.2
querystring-parser: 1.2.4
scikit-learn: 1.4.0
scipy: 1.14.1
sqlalchemy: 2.0.32
uvicorn: 0.30.6
virtualenv: 20.26.3
Code to reproduce issue
REPLACE_ME
Stack trace
REPLACE_ME
Other info / logs
REPLACE_ME
What component(s) does this bug affect?
- [X]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [X]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations