containers icon indicating copy to clipboard operation
containers copied to clipboard

[bitnami/mlflow] MLFlow missing package: google-cloud-storage

Open RussellSB opened this issue 1 year ago • 5 comments

Name and Version

bitnami/mlflow

What architecture are you using?

amd64

What steps will reproduce the bug?

Right now we have MLFlow setup with GCP and relying on the bitnami image. Whenever we try log ML models to the tracking server it tries saving it under the hood to google cloud storage but fails due to missing package google-cloud-storage and its dependencies (google.auth included). To reproduce simply without having to setup the whole GCP server;

  1. Load mlflow bitnami image.
  2. Start python
  3. Interpret from google.auth.exceptions import DefaultCredentialsError (as per https://github.com/mlflow/mlflow/blob/master/mlflow/store/artifact/gcs_artifact_repo.py)

What is the expected behavior?

It imports correctly.

What do you see instead?

Traceback (most recent call last):
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/bitnami/python/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 497, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 538, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 951, in _list_artifacts
    artifact_entities = _list_artifacts_for_proxied_run_artifact_root(
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 497, in wrapper
    return func(*args, **kwargs)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 981, in _list_artifacts_for_proxied_run_artifact_root
    artifact_destination_repo = _get_artifact_repo_mlflow_artifacts()
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 175, in _get_artifact_repo_mlflow_artifacts
    _artifact_repo = get_artifact_repository(os.environ[ARTIFACTS_DESTINATION_ENV_VAR])
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 117, in get_artifact_repository
    return _artifact_repository_registry.get_artifact_repository(artifact_uri)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 74, in get_artifact_repository
    return repository(artifact_uri)
  File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/artifact/gcs_artifact_repo.py", line 40, in __init__
    from google.auth.exceptions import DefaultCredentialsError
ModuleNotFoundError: No module named 'google.auth' 

Additional information

As a work around we install google-cloud-storage over the image everytime the server is connected to, but would be good to have it in built in the image since it is core functionality. Would open a PR but not sure where to install this missing package in the repo.

This also seems related; https://github.com/bitnami/charts/issues/22720

RussellSB avatar Apr 12 '24 13:04 RussellSB

Hi!

Thank you so much for reporting. Indeed, these packages are missing. I created a task in our backlog to add these missing pip modules.

javsalgar avatar Apr 18 '24 10:04 javsalgar

Great, thank you! Look forward to this,

RussellSB avatar Apr 18 '24 14:04 RussellSB

Hey, are there any updates? Would be great to know how far down the roadmap this issue could be tackled.

RussellSB avatar May 24 '24 10:05 RussellSB

I've been trying to see if I can add this dependency but I'm running into a wall... I don't have any way to see how the stacksmith dependencies are built, or make changes to it.

I tried adding the google-cloud-sdk dependency; but to no avail. From what I can see nowhere in this repository pip install is actually used; so it must be enforced. Perhaps a maintainer can help me understand how to do this?

I tried the following: https://gist.github.com/dhrp/f5ad291ab9ab583e85da1bf930326d33

but it doesn't install the python SDK / the import does not work.

[edit] Actually; simply adding:

RUN pip install google-cloud-storage

to the end of the Dockerfile works. Would you be interested in a contribution like this? -- or should it really go into the stacksmith part?

pinging @javsalgar. I'm planning to also pick up https://github.com/bitnami/charts/issues/22720; but this is a dependency.

dhrp avatar May 24 '24 15:05 dhrp

Hi everyone

Could you please give it a try using the image tag 2.14.1-debian-12-r1? We included the missing Python module on this image revision.

juan131 avatar Jun 28 '24 11:06 juan131

I'm closing this issue given we included the missing pip module in 2.14.1-debian-12-r1 revision, please reopen it if you require further assistance.

juan131 avatar Jul 15 '24 07:07 juan131