yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] log_artifact fails when tracking uri scheme is 'file'

Open yossibiton opened this issue 2 years ago • 20 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

  • Client: 2.1.1

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
  • Python version: 3.9.12

Describe the problem

I'm using ml flow on localhost (as described here). When I call log_artifact for live run I get the following error : mlflow.exceptions.MlflowException: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'https', 'http'}

Tracking information

MLflow module location: /home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/__init__.py
Tracking URI: file:///efs/mlflow/mlruns
Registry URI: file:///efs/mlflow/mlruns
MLflow environment variables: 
  MLFLOW_EXPERIMENT_NAME: object_detection
MLflow dependencies: 
  Flask: 2.2.2
  Jinja2: 3.1.2
  alembic: 1.9.3
  click: 8.1.3
  cloudpickle: 2.2.1
  databricks-cli: 0.17.4
  docker: 6.0.1
  entrypoints: 0.4
  gitpython: 3.1.30
  gunicorn: 20.1.0
  importlib-metadata: 5.0.0
  markdown: 3.4.1
  matplotlib: 3.6.2
  numpy: 1.23.4
  packaging: 21.3
  pandas: 1.5.1
  protobuf: 3.20.3
  pyarrow: 10.0.1
  pytz: 2022.6
  pyyaml: 6.0
  querystring-parser: 1.2.4
  requests: 2.27.1
  scikit-learn: 1.2.1
  scipy: 1.9.3
  shap: 0.41.0
  sqlalchemy: 1.4.46
  sqlparse: 0.4.3

Code to reproduce issue

import mlflow
mlflow_exp_name = 'object_detection'
mlflow.set_tracking_uri('file:///efs/mlflow/mlruns')
os.environ['MLFLOW_EXPERIMENT_NAME'] = mlflow_exp_name

path_html = 'example.html'
with open(path_html, "w") as f:
    f.write('')
with mlflow.start_run():
    mlflow.log_artifact(path_html)

Stack trace

Traceback (most recent call last):
  File "/efs/demo.py", line 36, in <module>
    mlflow.log_artifact(path_html)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 776, in log_artifact
    MlflowClient().log_artifact(run_id, local_path, artifact_path)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/tracking/client.py", line 1002, in log_artifact
    self._tracking_client.log_artifact(run_id, local_path, artifact_path)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 431, in log_artifact
    artifact_repo = self._get_artifact_repo(run_id)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 416, in _get_artifact_repo
    artifact_repo = get_artifact_repository(artifact_uri)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 106, in get_artifact_repository
    return _artifact_repository_registry.get_artifact_repository(artifact_uri)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 72, in get_artifact_repository
    return repository(artifact_uri)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 46, in __init__
    super().__init__(self.resolve_uri(artifact_uri, get_tracking_uri()))
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 61, in resolve_uri
    _validate_uri_scheme(track_parse.scheme)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 35, in _validate_uri_scheme
    raise MlflowException(
mlflow.exceptions.MlflowException: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'http', 'https'}

Process finished with exit code 1

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

  • [X] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [X] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

yossibiton avatar Feb 12 '23 12:02 yossibiton

Hi @yossibiton, thank you for raising this issue. The problem appears to be that your MLflow experiment with name object_detection was created using an HTTP request to mlflow server or was created by manually specifying an mlflow-artifacts:// URI as the artifact_location. In order to log artifacts to this experiment, you'll need to run your mlflow server and set the MLflow Tracking URI to communicate with the MLflow server, e.g. something like mlflow.set_tracking_uri("http://127.0.0.1:5000").

Thank you for using MLflow!

dbczumar avatar Feb 13 '23 23:02 dbczumar

Thank you for your response ! However, I'm still not able to solve the issue. I'm not using mlflow server, just storing the files locally on local path (/efs/mlflow) as you can see in my code example.

In order to create the experiments I did this :

cd /efs/mlflow
mlflow ui

And then in the UI I created a new experiment : image

Can you explain please what did I do wrong ? Do I have to set "Artifact Location" ?

yossibiton avatar Feb 14 '23 07:02 yossibiton

Hi @yossibiton , thank you for clarifying. I think this is a bug in mlflow ui. Because mlflow ui shares the same code with mlflow server, it uses a default artifact location of the form mlflow-artifacts:/ for newly-created experiments. Instead, it should use the local filesystem.

For now, you can circumvent the problem by specifying artifact_location and passing in a local filesystem path. I've reopened this issue, and we'll get it addressed in the next release.

dbczumar avatar Feb 14 '23 08:02 dbczumar

@yossibiton Any updates here? If you're working on a PR, please link it to this issue.

mlflow-automation avatar Mar 01 '23 00:03 mlflow-automation

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 35 days.

mlflow-automation avatar Mar 15 '23 01:03 mlflow-automation

I have a similar issue: I set the tracking URI directly from the tracking_uri parameter of MLFlowLogger to "http://localhost:5050/". I also set the log_model parameter to True.

However, when the training is complete, I receive the following error: mlflow.exceptions.MlflowException: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'http', 'https'}

I checked the tracking_uri value of MLFlow, but it seems that it switched back to the default value. Does this mean that we also need to set the tracking URI manually with native MLFlow, or can we still rely on the logger?

lcaquot94 avatar Mar 15 '23 14:03 lcaquot94

I have a similar problem, but the same stacktrace from .../artifact/... . I did:

$ mlflow gc --backend-store-uri=sqlite:///mlflow.db

and I get the following error:

# mlflow gc --older-than=1h  --backend-store-uri=sqlite:///mlflow.db 
2023/03/20 04:19:31 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2023/03/20 04:19:31 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Run with ID b27d67938f88468a8430407955b9ebaa has been permanently deleted.
Run with ID 02bd3de0e9a348a1ae7938a7851891ac has been permanently deleted.
Run with ID cc5565e682f14fc89c52291891ccf95b has been permanently deleted.
Run with ID 732d05c7b13b4c08a9bc8715bcd75e58 has been permanently deleted.
Traceback (most recent call last):
  File "/usr/local/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/mlflow/cli.py", line 573, in gc
    artifact_repo = get_artifact_repository(run.info.artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 106, in get_artifact_repository
    return _artifact_repository_registry.get_artifact_repository(artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 72, in get_artifact_repository
    return repository(artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 45, in __init__
    super().__init__(self.resolve_uri(artifact_uri, get_tracking_uri()))
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 59, in resolve_uri
    _validate_uri_scheme(track_parse.scheme)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 35, in _validate_uri_scheme
    raise MlflowException(
mlflow.exceptions.MlflowException: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'http', 'https'}

hkang-vuno avatar Mar 20 '23 04:03 hkang-vuno

@dbczumar I have raised a PR that would potentialy resolve this issue. I was wondering if you could take a look and provide some feedback. Thanks

Joel-hanson avatar Mar 24 '23 12:03 Joel-hanson

Hi everyone, I have recently faced the same issue while starting mlflow ui from my local docker container.

I was following official mlflow tutorials, from this. The code is following:

import mlflow
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)
predictions = rf.predict(X_test)

Which outputs this traceback:

(base) root@85d11d68ad08:/workspace/cv_dev/MLOps# python test1.py 
2023/05/06 18:04:31 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2023/05/06 18:04:31 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e116db062f054f0c9584041c92426f81', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2023/05/06 18:04:31 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'https', 'http'}

IMPORTANT, when I repeat this scenario with running mlflow ui from standard terminal (not a docker container), everything works fine.

beybars1 avatar May 06 '23 18:05 beybars1

File "/usr/local/lib/python3.11/site-packages/mlflow/store/artifact/mlflow_artifacts_repo.py", line 35, in _validate_uri_scheme raise MlflowException( mlflow.exceptions.MlflowException: The configured tracking uri scheme: 'file' is invalid for use with the proxy mlflow-artifact scheme. The allowed tracking schemes are: {'http', 'https'} exited with code 1

The feels - I also don't get any error when running locally, only when creating an mlflow ui or server, with the 0.0.0.0 host on docker.

alejandroniculescu avatar Aug 07 '23 00:08 alejandroniculescu

Any update? I'm trying to serve mlflow model using mlserver, mlflow models serve -m runs:/df738f66448047aca9a6a2e8c6982ed9/model --enable-mlserver and getting the exact same error. My server is running with this config:

mlflow server \
   --backend-store-uri  mysql+pymysql://root@localhost/mlflow_tracking_database \
   --default-artifact-root  file:/./mlruns \
   -h 0.0.0.0 -p 5000

dean-sh avatar Aug 17 '23 10:08 dean-sh

This bug is causing me a lot of pain. What's the recommended workaround?

ej159 avatar Aug 22 '23 17:08 ej159

@dean-sh has the right workaround. Essentially just make sure the tracking db points to anything other than its default location. --backend-store-uri mysql+pymysql://root@localhost/mlflow_tracking_database. Otherwise it seems to have a conflict over that route being used.

alejandroniculescu avatar Aug 22 '23 18:08 alejandroniculescu

This worked for me. If you are using local files either of the below solutions should work. Add this before mlflow.start_run() in your code.

import mlflow

# Set the tracking URI to a local directory
mlflow.set_tracking_uri("file:/path/to/your/local/directory")

# OR
mlflow.set_tracking_uri("http://127.0.0.1:5000")

shayan-nikoo avatar Aug 31 '23 10:08 shayan-nikoo

hey 👋🏻

I was following that article https://mlflow.org/docs/latest/quickstart_mlops.html. So I faced that issue. export MLFLOW_TRACKING_URI=http://127.0.0.1:5002 that command resolves the problem.

snowron avatar Sep 13 '23 12:09 snowron

This issue can be solved by exporting the enviroment variable MLFLOW_TRACKING_URI ! :+1:

export MLFLOW_TRACKING_URI=http://127.0.0.1:<PORT NUMBER>

Saugatkafley avatar Dec 14 '23 15:12 Saugatkafley

I've been using MLFlow for 30 minutes and this was quite confusing. I launched mlflow ui locally, pasted the suggested lines in the +New run button (mostly same as https://github.com/mlflow/mlflow/issues/7819#issuecomment-1537193986), and then I observe the warning.

Exporting the MLFLOW_TRACKING_URI environment variable as explained above seems to address the issue, and now I see a mlartifacts directory pop up in the working directory of mlflow ui.

astrojuanlu avatar Apr 30 '24 10:04 astrojuanlu

@yossibiton I got similar problem. I solved it by creating the experiment without the optional "Artifact Location" and then I updated the meta.yaml file of the experiment from:

artifact_location: mlflow-artifacts:/948861870279215421
creation_time: 1715846749592
experiment_id: "948861870279215421"
last_update_time: 1715846749592
lifecycle_stage: active
name: error_replication

to:

artifact_location: <path_to_your_local_mlruns_folder>/948861870279215421
creation_time: 1715846749592
experiment_id: "948861870279215421"
last_update_time: 1715846749592
lifecycle_stage: active
name: error_replication

The <path_to_your_local_mlruns_folder> you get from the mlflow.get_tracking_uri() which you execute in a python/pyspark console that runs in the same location as the mlflow ui or mlflow server.

grzegorz-karas avatar May 16 '24 08:05 grzegorz-karas

I found the problem is that I created the experiment in the mlflow ui after running mlflow ui, it's solved by creating experiment in the program with this code mlflow.create_experiment

notoookay avatar May 17 '24 02:05 notoookay

After encountering mlflow.exceptions.MlflowException errors while serving MLflow models, I found a solution that worked for me. By adding --no-conda to the mlflow models serve command, I bypassed Conda environment setup issues and successfully served my model. This might be useful if your setup doesn't require Conda or if you're facing compatibility challenges. For example mlflow models serve --model-uri models:/YourModeName/1 --host 0.0.0.0 --port 5001 --no-conda

Prasadchaskar avatar Jun 26 '24 17:06 Prasadchaskar