yocto-gl
yocto-gl copied to clipboard
[BUG] Artifacts not showing up in UI using minio bucket
Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the issues policy
Willingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
- Client: 1.x.y
- Tracking server: 2.7.1
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python version:
- yarn version, if running the dev UI:
Describe the problem
Hi Team,
I am using minio bucket to store artifacts in below mlflow server command. If I am using below command then in mlflow UI, I am getting below error. Can someone please help.
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://${MYSQL_USERNAME}:${MYSQL_PASSWORD}@mlflow-mysql.mlflow-namespace.svc.cluster.local:3306/auto*** --gunicorn-opts '--log-level debug' --workers 2 --default-artifact-root
mlflow-artifacts:/ --artifacts-destination s3://auto-artifacts/ --serve-artifacts
error "Unable to list artifacts stored under {artifactUri}
for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory."
Tracking information
REPLACE_ME
Code to reproduce issue
REPLACE_ME
Stack trace
REPLACE_ME
Other info / logs
REPLACE_ME
What component(s) does this bug affect?
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/gateway
: AI Gateway service, Gateway client APIs, third-party Gateway integrations - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
@kavita1205 are there any tracking server logs?
@harupy Yes, I ran below code and here is the log for the same code:
code:
import mlflow
from time import time
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
import numpy as np
from sklearn.metrics import accuracy_score
import joblib
def run_model():
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
### TRAIN MODEL
trained_model = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
trained_model.fit(X_train, y_train)
### MLFLOW - LOG METRIC
accurary = trained_model.score(X_test, y_test)
print("accurary:", accurary)
mlflow.log_metric("mean-accuracy", float(accurary))
### MLFLOW - LOG MODEL
mlflow.sklearn.log_model(
trained_model, "random_forest"
) ### <- The second param is an arbitrary param
TMSTP = round(time() * 1000)
#### MLFLOW CONNECTION TEST
TRACK_URI = "https://tracking-server-autosense.corp.****.com/auto***/"
EXPERIMENT_NAME = f"test_run_{TMSTP}" ### <- DO NOT CHANGE THIS PART OF THE CODE.
# The below script creates a new experiment using the above variable and uses the returned experiment id to submit the training job.
if not mlflow.is_tracking_uri_set():
# set tracking uri
mlflow.set_tracking_uri(TRACK_URI)
print("mlflow tracking uri:", mlflow.get_tracking_uri())
# Create an experiment if not exists and capture the experiment Id
EXPERIMENT_ID = mlflow.create_experiment(EXPERIMENT_NAME)
# set as active experiment
experiment = mlflow.set_experiment(EXPERIMENT_NAME)
print(
f"Mlflow Active Experiment:{EXPERIMENT_NAME}\nMlflow Experiment ID:{EXPERIMENT_ID}"
)
### MLFLOW
# Set a batch of tags
tags = {
"engineering": "MLFlow Test",
"tstmp": str(TMSTP),
}
with mlflow.start_run(
experiment_id=EXPERIMENT_ID,
run_name=EXPERIMENT_NAME,
tags=tags,
description=EXPERIMENT_NAME,
):
run_model()
logs
[2023-10-20 08:14:09 +0000] [25] [DEBUG] GET /ajax-api/2.0/mlflow/artifacts/list
2023/10/20 08:14:11 ERROR mlflow.server: Exception on /ajax-api/2.0/mlflow/artifacts/list [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 476, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 517, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 953, in _list_artifacts
artifact_entities = _get_artifact_repo(run).list_artifacts(path)
File "/usr/local/lib/python3.10/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 187, in list_artifacts
for result in results:
File "/usr/local/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
response = self._make_request(current_kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
return self._method(**current_kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 963, in _make_api_call
http, parsed_response = self._make_request(
File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 986, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request
self._event_emitter.emit(
File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler
return self.sign(operation_name, request)
File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign
auth.add_auth(request)
File "/usr/local/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Looks like you're missing credentials. Can you try uploading/downloading artifacts using only boto3 without mlflow? Does it work or not?
I am already passing credentials. via below command:
mlflow server -- host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://auto***:au123@mlflow-autos-my sql.mlflow-namespace.svc.cluster.local:3306/autose***--gunicorn-opts --log-level debug --workers 2 -- default-artifact-root s3://auto***-artifacts/ --serve-artifacts
Where are credentials for minio?
We are deploying mlflow via helm chart into kubernetes cluter. In that, we are passing minio credentials via secrets.
kubectl get secrets mlflow-env-secret -o yaml apiVersion: v1 data: ARTIFACTORY_API_KEY: SWtGTFEzQTRjRkZpYTI1cGFtSjBhMlZvUkhOaldrMWhZblUyUjFSVmRXczVibTFGZDFwcFpYSnBNWEZHZUhKNU1reFpVSEpxUmpoV1lWbHhVVFZCYVhwVmJreFJhWFkwVWt3aQ== MINIO_ACCESS_KEY_ID: OGQwbHkwTHE3U0JJZkJVeA== MINIO_ROOT_PASSWORD: YXV0b21vdGl2ZS1hcnRpZmFjdHM= MINIO_ROOT_USER: YXV0b21vdGl2ZS1hcnRpZmFjdHMtdXNlcg== MINIO_SECRET_ACCESS_KEY: b0ZPdFFiZkRwTjFod2ZtMDFIcUsyemo4REhueW5rQUk= MYSQL_PASSWORD: YXV0b3NlbnNlXzEyMw== MYSQL_USERNAME: YXV0b3NlbnNlMQ== kind: Secret metadata: annotations: meta.helm.sh/release-name: mlflow-auto**** meta.helm.sh/release-namespace: mlflow-namespace creationTimestamp: "2023-10-20T08:09:58Z" labels: app: mlflow app.kubernetes.io/managed-by: Helm chart: mlflow-0.7.20 heritage: Helm release: mlflow-aut**** name: mlflow-auto****-env-secret namespace: mlflow-namespace resourceVersion: "317686863" uid: 9371d9cd-6cc6-4440-a82b-748831b1a1e6 type: Opaque
Got it.
Can you try uploading/downloading artifacts using only boto3 without mlflow? Does it work or not?
Can you check this 👆?
let me check
@harupy yes I tested via boto3 and it is able to upload artifacts in minio but when I am trying with mlflow then its throwing same error mentioned above.
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
@harupy can you please help me here. As suggested by you, I tested via boto3 and it is able to upload artifacts in minio but when I am trying with mlflow then its throwing same error mentioned above.
I am passing below command in deployment.yaml file: mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri mysql+pymysql://auto***:auto***@mlflow-auto***-mysql.mlflow-namespace.svc.cluster.local:3306/auto**** --gunicorn-opts --log-level debug --workers 2 -- default-artifact-root https://mlm***.corp..com:9000/auto-artifacts/ --serve-artifacts
@BenWilson2 can you guys help me here.
I've also been struggling with this same issue for about 6 months now, through multiple versions of MLFlow. To add context to the issue, here is the state of the issue for me using the latest MLFLow (v2.9.2):
Platform Configuration
OS: Ubuntu 22.04.2 LTS
Python Version: 3.11.5
MLFlow Version: 2.9.2
Environment: Python Virtual Environment or in the 3.11.5 Docker Image (I've tried running the mlflow server in both and the results are the same)
Installation Method: pip install mlflow awscli boto3[crt]
MinIO Setup
I have my MinIO server running in a docker container on the same machine, it is started using the following command:
docker run -it --rm -d \
-u 1000:1000 \
-p 9000:9000 \
-p 9090:9090 \
--name "minio-service" \
-v /path/to/my/data:/data \
-e "MINIO_ROOT_USER=myuser" \
-e "MINIO_ROOT_PASSWORD=mypassword" \
minio/minio:RELEASE.2023-01-12T02-06-16Z \
server /data --console-address ":9090"
For this MinIO server I've also generated an public/private access key pair. For the sake of this post, we will call them the following:
- Public Key: my-pub-key
- Private Key: my-priv-key
The bucket in MinIO that I have all of my artifacts in is called mlflow-artifacts
. It's permissions are set wide open for read/write access.
Environment Variable Setup
Before starting up the MLFlow tracking server, I set the following environment variables as described here:
export MLFLOW_S3_ENDPOINT_URL=http://<my-machine-ipv4>:9000
export AWS_ACCESS_KEY_ID=myuser
export AWS_SECRET_ACCESS_KEY=mypassword
Note the docs imply that you should use your username and password for the MinIO default user, however I have also attempted to use the generated public/private key pair in place of the last two variables:
export MLFLOW_S3_ENDPOINT_URL=http://<my-machine-ipv4>:9000
export AWS_ACCESS_KEY_ID=my-pub-key
export AWS_SECRET_ACCESS_KEY=my-priv-key
However, the errors below still persist in both cases.
MLFlow Startup Config
I've tried many combinations of mlflow server
args to try and get this to work, but here is generally how MLFlow is started:
mlflow server \
--dev \
--host="<my-machine-ipv4> \
--port=5000 \
--backend-store-uri="/path/to/file/based/storage" \
--artifacts-destination="s3://mlflow-artifacts" \
--serve-artifacts
With this everything appears to work just fine:
- Browsing experiment metrics
- Pushing new runs and experiments via the mlflow python client
- Manipulating metric graphs in the UI
- Even pushing new artifacts from the python client!!
- I even see all of the artifacts in the MinIO bucket browser organized appropriately by the experiment and run IDs!
However, when clicking on a specific run, I can view all tags, metrics, and parameters, but not the artifacts. I see a similar error code each time:
I've attempted various other mlflow server
argument configurations:
- Such as not proxying artifacts via
--no-serve-artifacts
instead - Or also setting
--default-artifact-root=s3://mlflow-artifacts
as well
But nothing appears to be working. Each time the error in the UI pops up and the MLFlow Server logs show the following:
[2024-01-15 19:46:54 +0000] [25] [DEBUG] GET /ajax-api/2.0/mlflow/artifacts/list
[2024-01-15 19:47:24 +0000] [22] [CRITICAL] WORKER TIMEOUT (pid:25)
[2024-01-15 19:47:24 +0000] [25] [INFO] Worker exiting (pid: 25)
[2024-01-15 19:47:25 +0000] [22] [ERROR] Worker (pid:25) exited with code 1
[2024-01-15 19:47:25 +0000] [22] [ERROR] Worker (pid:25) exited with code 1.
[2024-01-15 19:47:25 +0000] [91] [INFO] Booting worker with pid: 91
Where it appears to break on the GET call to the /ajax-api/2.0/mlflow/artifacts/list
endpoint.
I'd love for this issue to be fixed, it would really unlock a lot of potential for users who want/need to keep all data on local servers.
Did you solve it @AndrewSpittlemeister ?