yocto-gl [BUG] Artifacts not showing up in UI using minio bucket

Issues Policy acknowledgement

[X] I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

Client: 1.x.y
Tracking server: 2.7.1

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Python version:
yarn version, if running the dev UI:

Describe the problem

Hi Team,

I am using minio bucket to store artifacts in below mlflow server command. If I am using below command then in mlflow UI, I am getting below error. Can someone please help.

mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://${MYSQL_USERNAME}:${MYSQL_PASSWORD}@mlflow-mysql.mlflow-namespace.svc.cluster.local:3306/auto*** --gunicorn-opts '--log-level debug' --workers 2  --default-artifact-root
mlflow-artifacts:/ --artifacts-destination s3://auto-artifacts/ --serve-artifacts

error "Unable to list artifacts stored under {artifactUri} for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory."

Tracking information

REPLACE_ME

Code to reproduce issue

REPLACE_ME

Stack trace

REPLACE_ME

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Oct 19 '23 15:10 kavita1205

@kavita1205 are there any tracking server logs?

Oct 20 '23 02:10 harupy

@harupy Yes, I ran below code and here is the log for the same code:

code:

import mlflow
from time import time
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
import numpy as np
from sklearn.metrics import accuracy_score
import joblib


def run_model():
    db = load_diabetes()
    X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

    ### TRAIN MODEL
    trained_model = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
    trained_model.fit(X_train, y_train)

    ### MLFLOW - LOG METRIC
    accurary = trained_model.score(X_test, y_test)
    print("accurary:", accurary)
    mlflow.log_metric("mean-accuracy", float(accurary))

    ### MLFLOW - LOG MODEL
    mlflow.sklearn.log_model(
        trained_model, "random_forest"
    )  ### <- The second param is an arbitrary param


TMSTP = round(time() * 1000)

#### MLFLOW CONNECTION TEST
TRACK_URI = "https://tracking-server-autosense.corp.****.com/auto***/"

EXPERIMENT_NAME = f"test_run_{TMSTP}"  ### <- DO NOT CHANGE THIS PART OF THE CODE.
# The below script creates a new experiment using the above variable and uses the returned experiment id to submit the training job.

if not mlflow.is_tracking_uri_set():
    # set tracking uri
    mlflow.set_tracking_uri(TRACK_URI)
    print("mlflow tracking uri:", mlflow.get_tracking_uri())
# Create an experiment if not exists and capture the experiment Id
EXPERIMENT_ID = mlflow.create_experiment(EXPERIMENT_NAME)
# set as active experiment
experiment = mlflow.set_experiment(EXPERIMENT_NAME)
print(
    f"Mlflow Active Experiment:{EXPERIMENT_NAME}\nMlflow Experiment ID:{EXPERIMENT_ID}"
)

### MLFLOW
# Set a batch of tags
tags = {
    "engineering": "MLFlow Test",
    "tstmp": str(TMSTP),
}

with mlflow.start_run(
    experiment_id=EXPERIMENT_ID,
    run_name=EXPERIMENT_NAME,
    tags=tags,
    description=EXPERIMENT_NAME,
):
    run_model()

logs

[2023-10-20 08:14:09 +0000] [25] [DEBUG] GET /ajax-api/2.0/mlflow/artifacts/list
2023/10/20 08:14:11 ERROR mlflow.server: Exception on /ajax-api/2.0/mlflow/artifacts/list [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 476, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 517, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 953, in _list_artifacts
    artifact_entities = _get_artifact_repo(run).list_artifacts(path)
  File "/usr/local/lib/python3.10/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 187, in list_artifacts
    for result in results:
  File "/usr/local/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 963, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 986, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request
    self._event_emitter.emit(
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

Oct 20 '23 08:10 kavita1205

Looks like you're missing credentials. Can you try uploading/downloading artifacts using only boto3 without mlflow? Does it work or not?

Oct 20 '23 08:10 harupy

I am already passing credentials. via below command:

mlflow server -- host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://auto***:au123@mlflow-autos-my sql.mlflow-namespace.svc.cluster.local:3306/autose***--gunicorn-opts --log-level debug --workers 2 -- default-artifact-root s3://auto***-artifacts/ --serve-artifacts

Oct 20 '23 08:10 kavita1205

Where are credentials for minio?

Oct 20 '23 08:10 harupy

We are deploying mlflow via helm chart into kubernetes cluter. In that, we are passing minio credentials via secrets.

kubectl get secrets mlflow-env-secret -o yaml apiVersion: v1 data: ARTIFACTORY_API_KEY: SWtGTFEzQTRjRkZpYTI1cGFtSjBhMlZvUkhOaldrMWhZblUyUjFSVmRXczVibTFGZDFwcFpYSnBNWEZHZUhKNU1reFpVSEpxUmpoV1lWbHhVVFZCYVhwVmJreFJhWFkwVWt3aQ== MINIO_ACCESS_KEY_ID: OGQwbHkwTHE3U0JJZkJVeA== MINIO_ROOT_PASSWORD: YXV0b21vdGl2ZS1hcnRpZmFjdHM= MINIO_ROOT_USER: YXV0b21vdGl2ZS1hcnRpZmFjdHMtdXNlcg== MINIO_SECRET_ACCESS_KEY: b0ZPdFFiZkRwTjFod2ZtMDFIcUsyemo4REhueW5rQUk= MYSQL_PASSWORD: YXV0b3NlbnNlXzEyMw== MYSQL_USERNAME: YXV0b3NlbnNlMQ== kind: Secret metadata: annotations: meta.helm.sh/release-name: mlflow-auto**** meta.helm.sh/release-namespace: mlflow-namespace creationTimestamp: "2023-10-20T08:09:58Z" labels: app: mlflow app.kubernetes.io/managed-by: Helm chart: mlflow-0.7.20 heritage: Helm release: mlflow-aut**** name: mlflow-auto****-env-secret namespace: mlflow-namespace resourceVersion: "317686863" uid: 9371d9cd-6cc6-4440-a82b-748831b1a1e6 type: Opaque

Oct 20 '23 08:10 kavita1205

Got it.

Can you try uploading/downloading artifacts using only boto3 without mlflow? Does it work or not?

Can you check this 👆?

Oct 20 '23 08:10 harupy

let me check

Oct 20 '23 10:10 kavita1205

@harupy yes I tested via boto3 and it is able to upload artifacts in minio but when I am trying with mlflow then its throwing same error mentioned above.

Oct 20 '23 17:10 kavita1205

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Oct 27 '23 00:10 github-actions[bot]

@harupy can you please help me here. As suggested by you, I tested via boto3 and it is able to upload artifacts in minio but when I am trying with mlflow then its throwing same error mentioned above.

I am passing below command in deployment.yaml file: mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://auto***:auto***@mlflow-auto***-mysql.mlflow-namespace.svc.cluster.local:3306/auto**** --gunicorn-opts --log-level debug --workers 2 -- default-artifact-root https://mlm***.corp..com:9000/auto-artifacts/ --serve-artifacts

Oct 30 '23 13:10 kavita1205

@BenWilson2 can you guys help me here.

Nov 08 '23 14:11 kavita1205

I've also been struggling with this same issue for about 6 months now, through multiple versions of MLFlow. To add context to the issue, here is the state of the issue for me using the latest MLFLow (v2.9.2):

Platform Configuration

OS: Ubuntu 22.04.2 LTS Python Version: 3.11.5 MLFlow Version: 2.9.2 Environment: Python Virtual Environment or in the 3.11.5 Docker Image (I've tried running the mlflow server in both and the results are the same) Installation Method: pip install mlflow awscli boto3[crt]

MinIO Setup

I have my MinIO server running in a docker container on the same machine, it is started using the following command:

docker run -it --rm -d \
        -u 1000:1000 \
        -p 9000:9000 \
        -p 9090:9090 \
        --name "minio-service" \
        -v /path/to/my/data:/data \
        -e "MINIO_ROOT_USER=myuser" \
        -e "MINIO_ROOT_PASSWORD=mypassword" \
        minio/minio:RELEASE.2023-01-12T02-06-16Z \
                server /data --console-address ":9090"

For this MinIO server I've also generated an public/private access key pair. For the sake of this post, we will call them the following:

Public Key: my-pub-key
Private Key: my-priv-key

The bucket in MinIO that I have all of my artifacts in is called mlflow-artifacts. It's permissions are set wide open for read/write access.

Environment Variable Setup

Before starting up the MLFlow tracking server, I set the following environment variables as described here:

export MLFLOW_S3_ENDPOINT_URL=http://<my-machine-ipv4>:9000
export AWS_ACCESS_KEY_ID=myuser
export AWS_SECRET_ACCESS_KEY=mypassword

Note the docs imply that you should use your username and password for the MinIO default user, however I have also attempted to use the generated public/private key pair in place of the last two variables:

export MLFLOW_S3_ENDPOINT_URL=http://<my-machine-ipv4>:9000
export AWS_ACCESS_KEY_ID=my-pub-key
export AWS_SECRET_ACCESS_KEY=my-priv-key

However, the errors below still persist in both cases.

MLFlow Startup Config

I've tried many combinations of mlflow server args to try and get this to work, but here is generally how MLFlow is started:

mlflow server \
    --dev \
    --host="<my-machine-ipv4> \
    --port=5000 \
    --backend-store-uri="/path/to/file/based/storage" \
    --artifacts-destination="s3://mlflow-artifacts" \
    --serve-artifacts

With this everything appears to work just fine:

Browsing experiment metrics
Pushing new runs and experiments via the mlflow python client
Manipulating metric graphs in the UI
Even pushing new artifacts from the python client!!
- I even see all of the artifacts in the MinIO bucket browser organized appropriately by the experiment and run IDs!

However, when clicking on a specific run, I can view all tags, metrics, and parameters, but not the artifacts. I see a similar error code each time:

I've attempted various other mlflow server argument configurations:

Such as not proxying artifacts via --no-serve-artifacts instead
Or also setting --default-artifact-root=s3://mlflow-artifacts as well

But nothing appears to be working. Each time the error in the UI pops up and the MLFlow Server logs show the following:

[2024-01-15 19:46:54 +0000] [25] [DEBUG] GET /ajax-api/2.0/mlflow/artifacts/list
[2024-01-15 19:47:24 +0000] [22] [CRITICAL] WORKER TIMEOUT (pid:25)
[2024-01-15 19:47:24 +0000] [25] [INFO] Worker exiting (pid: 25)
[2024-01-15 19:47:25 +0000] [22] [ERROR] Worker (pid:25) exited with code 1
[2024-01-15 19:47:25 +0000] [22] [ERROR] Worker (pid:25) exited with code 1.
[2024-01-15 19:47:25 +0000] [91] [INFO] Booting worker with pid: 91

Where it appears to break on the GET call to the /ajax-api/2.0/mlflow/artifacts/list endpoint.

I'd love for this issue to be fixed, it would really unlock a lot of potential for users who want/need to keep all data on local servers.

Jan 15 '24 20:01 AndrewSpittlemeister

Did you solve it @AndrewSpittlemeister ?

Aug 07 '24 04:08 provezano

yocto-gl yocto-gl copied to clipboard

[BUG] Artifacts not showing up in UI using minio bucket

Issues Policy acknowledgement

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

Platform Configuration

MinIO Setup

Environment Variable Setup

MLFlow Startup Config

yocto-gl
yocto-gl copied to clipboard