charts icon indicating copy to clipboard operation
charts copied to clipboard

[bitnami/mlflow] fix issue 26397 - allow MLFLOW_S3_ENDPOINT without minio

Open dhrp opened this issue 1 year ago • 7 comments

Description of the change

Changes the condition to check whether either Minio OR externalS3 is enabled.

  • [x] Double check if we shouldn't use the condition in _helpers.tpl

[edit] moved setting the variable to inside the already existing check if s3 is enabled. This simplifies it.

Benefits

This fixes a problem / bug in the chart where an external S3 address cannot be set unless Minio is also enabled; which is unwanted / undesirable.

Possible drawbacks

none really

Applicable issues

  • fixes #26397

Additional information

Checklist

  • [x] Chart version bumped in Chart.yaml according to semver. This is not necessary when the changes only affect README.md files.
  • [x] Variables are documented in the values.yaml and added to the README.md using readme-generator-for-helm
  • [x] Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
  • [x] All commits signed off and in agreement of Developer Certificate of Origin (DCO)

dhrp avatar May 27 '24 16:05 dhrp

Hi @dhrp, thank you very much for contributing to this Helm chart.

This PR is reverting the changes introduced in #25294, sent on an attempt to fix #23959. Wouldn't merging your proposed changes cause regressions for the users affected by #23959?

In that issue, we came to the conclusion that MLFLOW_S3_ENDPOINT_URL was only required when MinIO was enabled (maybe that conclusion is wrong?) and users requiring to set the envar in some other scenarios would always be able to set it via extraEnvVars like this:

$ helm template oci://registry-1.docker.io/bitnamicharts/mlflow --version 1.2.2 --set minio.enabled=false --set tracking.extraEnvVars[0].name=MLFLOW_S3_ENDPOINT_URL,tracking
.extraEnvVars[0].value='https://my-s3-endpoint:443' | grep -A1 MLFLOW_S3_ENDPOINT_URL
            - name: MLFLOW_S3_ENDPOINT_URL
              value: https://my-s3-endpoint:443

Can you share your thoughts on this? Thank you!

andresbono avatar Jun 04 '24 07:06 andresbono

@frittentheke was right; I had actually first fixed it right; and then simplified it wrong. When S3 is enabled and Minio is not, the host does not need to be set when you are using Amazon AWS S3.

I now resolved it by dropping my own last commit.

dhrp avatar Jun 12 '24 10:06 dhrp

I left a suggestion and a concern as an inline comment. Please, check it when you have a chance! Thank you.

andresbono avatar Jun 13 '24 14:06 andresbono

I left a suggestion and a concern as an inline comment. Please, check it when you have a chance! Thank you.

I replied in the thread @andresbono .

Would be nice to get the ability to use external non-AWS S3.

frittentheke avatar Jun 21 '24 18:06 frittentheke

I've resolved the comment regarding the helper and rebased on main.

dhrp avatar Jun 24 '24 07:06 dhrp

@javsalgar thanks for reopening https://github.com/bitnami/charts/issues/26397. Could you please kindly take a peek at this PR then?

frittentheke avatar Jun 24 '24 08:06 frittentheke

Thank you, @dhrp, @frittentheke. I reopened https://github.com/bitnami/charts/pull/26462#discussion_r1650708286. Sharing my concerns there.

andresbono avatar Jun 24 '24 09:06 andresbono

Thank you for the additional information. However, I still have the concern that this PR will introduce a regression and we will face #23959 again. I'm not saying we do not have to address #26397, but I'm not sure if this PR is the proper fix.

I gave this PR a try. Let me share my findings (click on the collapsible scenarios to see the details):

Scenario 1: Store artifacts in external S3-compatible storage (MinIO) ✅

To take the minio subchart out of the equation, I'm deactivating it and explicitly installing it with hardcoded credentials.

  1. Install MinIO

values-minio.yaml customizations:

auth:
  rootPassword: somepassword
defaultBuckets: test-mlflow
provisioning:
  enabled: true
  extraCommands:
    - mc anonymous set download provisioning/test-mlflow
$ helm install minio oci://registry-1.docker.io/bitnamicharts/minio --version 14.6.16 --values /tmp/values-minio.yaml
  1. Install mlflow (with the changes from this PR) values-mlflow-minio.yaml customizations:
run:
  source:
    configMap:
      # Example taken from the MLFlow UI
      test.py: |
        import mlflow
        from sklearn.model_selection import train_test_split
        from sklearn.datasets import load_diabetes
        import sklearn.ensemble

        # set the experiment id
        mlflow.set_experiment(experiment_id="0")

        mlflow.autolog()
        db = load_diabetes()

        X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

        # Create and train models.
        rf = sklearn.ensemble.RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
        rf.fit(X_train, y_train)

        # Use the model to make predictions on the test dataset.
        predictions = rf.predict(X_test)

minio:
  enabled: false

externalS3:
  host: minio.default.svc.cluster.local
  port: 9000
  protocol: http
  bucket: test-mlflow
  accessKeyID: admin
  accessKeySecret: somepassword
$ helm install mlflow-minio ./bitnami/mlflow --values /tmp/values-mlflow-minio.yaml
  1. Create experiment run
$ kubectl exec -it deployment/mlflow-minio-run -- python test.py
2024/07/01 14:19:27 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '7fe8ec68831b4eeea1680fe52753d45a', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
  1. Check artifacts in MinIO ✅
$ kubectl exec -it deployment/minio -- mc ls local/test-mlflow/0
[2024-07-01 14:25:36 UTC]     0B 7fe8ec68831b4eeea1680fe52753d45a/
  1. Check artifacts via UI ✅
$ kubectl get secret --namespace default mlflow-minio-tracking -o jsonpath="{.data.admin-password }" | base64 -d
$ kubectl port-forward svc/mlflow-minio-tracking 8080:80

http://localhost:8080/#/experiments/0/runs/7fe8ec68831b4eeea1680fe52753d45a/artifacts

image

  1. Check mlflow logs ✅
$ kubectl logs deploy/mlflow-minio-tracking
Scenario 2: Store artifacts in AWS S3 ❌
  1. Create an AWS S3 bucket
test-mlflow-ab891a3
  1. Install mlflow (with the changes from this PR)

values-mlflow-s3.yaml customizations:

run:
  source:
    configMap:
      # Example taken from the MLFlow UI
      test.py: |
        import mlflow
        from sklearn.model_selection import train_test_split
        from sklearn.datasets import load_diabetes
        import sklearn.ensemble

        # set the experiment id
        mlflow.set_experiment(experiment_id="0")

        mlflow.autolog()
        db = load_diabetes()

        X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

        # Create and train models.
        rf = sklearn.ensemble.RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
        rf.fit(X_train, y_train)

        # Use the model to make predictions on the test dataset.
        predictions = rf.predict(X_test)

minio:
  enabled: false

externalS3:
  host: test-mlflow-ab891a3.s3.amazonaws.com
  bucket: test-mlflow-ab891a3
  useCredentialsInSecret: false

tracking:
  extraEnvVars:
    - name: AWS_ACCESS_KEY_ID
      value: ASIA***
    - name: AWS_SECRET_ACCESS_KEY
      value: '***'
    - name: AWS_SESSION_TOKEN
      value: '***'
$ helm install mlflow-s3 ./bitnami/mlflow --values /tmp/values-mlflow-s3.yaml
  1. Create experiment run
$ kubectl exec -it deployment/mlflow-s3-run -- python test.py
2024/07/01 15:12:56 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'f9e2c06af11b4ff787ee400e5912d75b', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
  1. Check artifacts in S3 ⚠️
$ aws s3 ls s3://test-mlflow-ab891a3/test-mlflow-ab891a3/0/
                           PRE f9e2c06af11b4ff787ee400e5912d75b/

They exist but the objects are uploaded in a root directory named like the bucket.

  1. Check artifacts via UI ❌
$ kubectl get secret --namespace default mlflow-s3-tracking -o jsonpath="{.data.admin-password }" | base64 -d
$ kubectl port-forward svc/mlflow-s3-tracking 8080:80

http://localhost:8080/#/experiments/0/runs/f9e2c06af11b4ff787ee400e5912d75b/artifacts

image

Artifacts cannot be retrieved.

  1. Check mlflow logs ❌
$ kubectl logs deploy/mlflow-s3-tracking
2024/07/01 15:17:11 ERROR mlflow.server: Exception on /ajax-api/2.0/mlflow/artifacts/list [GET]
Traceback (most recent call last):
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
...
  File "/opt/bitnami/python/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.

I guess the error in "Scenario 2" is raised because the objects are not written in the correct path (the root directory named like the bucket I believe). The documentation warns about this: https://mlflow.org/docs/2.14.1/tracking/artifacts-stores.html#setting-bucket-region.

With bitnami/mlflow:1.4.12, "Scenario 2" is working as expected and "Scenario 1" would just need to explicitly set the MLFLOW_S3_ENDPOINT_URL env-var:

tracking:
  extraEnvVars:
    - name: MLFLOW_S3_ENDPOINT_URL
      value: 'http://minio.default.svc.cluster.local:9000'

Let me know your thoughts on this @dhrp, @frittentheke.

andresbono avatar Jul 01 '24 15:07 andresbono

I gave this PR a try. Let me share my findings (click on the collapsible scenarios to see the details): Scenario 1: Store artifacts in external S3-compatible storage (MinIO) ✅ Scenario 2: Store artifacts in AWS S3 ❌

I guess the error in "Scenario 2" is raised because the objects are not written in the correct path (the root directory named like the bucket I believe). The documentation warns about this: https://mlflow.org/docs/2.14.1/tracking/artifacts-stores.html#setting-bucket-region.

With bitnami/mlflow:1.4.12, "Scenario 2" is working as expected and "Scenario 1" would just need to explicitly set the MLFLOW_S3_ENDPOINT_URL env-var

First of all thank you very much for the time and intensive testing @andresbono! Sorry it took me a while to respond.

The issue here is simply the two addressing styles for buckets. There is the VirtualHosted- and the Path-Style.

This is the MLflow code (boto3 AWS S3 client) that is used here: https://github.com/mlflow/mlflow/blob/0ebbefc8c075165b558dfb481cf1e39d5cb53003/mlflow/store/artifact/s3_artifact_repo.py#L72C50-L72C89

The addressing_style parameter is the important bit - see https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/s3.html#changing-the-addressing-style

MLflow actually allows to also set the MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE via https://github.com/mlflow/mlflow/blob/0ebbefc8c075165b558dfb481cf1e39d5cb53003/mlflow/environment_variables.py#L605 and defaults to auto.

And auto in case of AWS means that a hostname containing a bucket name in front already addresses the individual bucket. So the first part of the path is NOT the bucket (as with the path-style), but a first "subfolder" - which is exactly what you saw.

If you change the endpoint to a path-style pattern with the bucket name at the end (and NOT as hostname) it should work.

Edit: Maybe it makes sense to also expose the MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE (https://github.com/mlflow/mlflow/blob/0ebbefc8c075165b558dfb481cf1e39d5cb53003/mlflow/environment_variables.py#L605) variable via the Helm chart / values.yaml, to make the externalS3 configuration as flexible as mlFlow itself? @dhrp ?

frittentheke avatar Jul 04 '24 13:07 frittentheke

@andresbono @javsalgar Could you kindly check if my explanation (https://github.com/bitnami/charts/pull/26462#issuecomment-2209054770) holds true and this PR could move forward?

frittentheke avatar Jul 10 '24 07:07 frittentheke

That made the trick, @frittentheke. Thank you so much for looking into it.

For external AWS S3, externalS3.host=s3.amazonaws.com works flawlessly, there is no need to include the bucket name in the host (virtual-hosted–style) as it leads to a malformed endpoint uri.

Edit: Maybe it makes sense to also expose the MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE (https://github.com/mlflow/mlflow/blob/0ebbefc8c075165b558dfb481cf1e39d5cb53003/mlflow/environment_variables.py#L605) variable via the Helm chart / values.yaml, to make the externalS3 configuration as flexible as mlFlow itself? @dhrp ?

A user could pass the MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE via the tracking.extraEnvVars value if needed.

andresbono avatar Jul 15 '24 14:07 andresbono

@andresbono should this auto-merge or are the CI failures expected?

frittentheke avatar Jul 16 '24 08:07 frittentheke

@andresbono should this auto-merge or CI are the failures expected?

No, the CI failures are not expected. Let me look into it.

andresbono avatar Jul 17 '24 09:07 andresbono

Finally merged!! Thank you so much for being collaborative and digging into our suggestions and questions!!

andresbono avatar Jul 17 '24 15:07 andresbono