Official Mlflow docker image does not support postgres
Hello!
I am wondering why the default mlflow/mlflow docker image in the github container registry does not have the necessary requirements installed to run with a postgres server? It seems that many tutorials for the mlflow server on the official website use postgres://... uris for the db backend. As far as I can see, even your tests run some postgres, but the image does not support it (missing psycopg package). Am I wrong? Do I really have to roll my own Dockerfile instead of using one that comes from the official repo?
Thanks!
Benjamin
Hi @benelot, thanks for raising the issue! For installing postgres related requirements it would increase image size, and we're not sure if all customers are using it. But we can keep this issue open to see if any more people vote on this :D
Would it be possible to provide a flag that would download the psycopg2-binary or if postgres environment variables are detected? You'd likely need some sort of startup script.
@benelot I didn't need to roll my own Dockerfile... passing this as the command works (I'm starting up the mlflow server in a docker compose):
bash -c "python3 -m pip install pip --upgrade && \ python3 -m pip install psycopg2-binary && \ mlflow server ${WHATEVER_FLAGS_YOU_NEED}"
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
The same goes for MySQL:
2023/09/17 17:54:47 ERROR mlflow.cli: Error initializing backend store
2023/09/17 17:54:47 ERROR mlflow.cli: No module named 'MySQLdb'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/mlflow/cli.py", line 426, in server
initialize_backend_stores(backend_store_uri, registry_store_uri, default_artifact_root)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 289, in initialize_backend_stores
_get_tracking_store(backend_store_uri, default_artifact_root)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 268, in _get_tracking_store
_tracking_store = _tracking_store_registry.get_store(store_uri, artifact_root)
File "/usr/local/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
File "/usr/local/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/registry.py", line 49, in _get_store_with_resolved_uri
return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)
File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 129, in _get_sqlalchemy_store
return SqlAlchemyStore(store_uri, artifact_uri)
File "/usr/local/lib/python3.10/site-packages/mlflow/store/tracking/sqlalchemy_store.py", line 150, in __init__
] = mlflow.store.db.utils.create_sqlalchemy_engine_with_retry(db_uri)
File "/usr/local/lib/python3.10/site-packages/mlflow/store/db/utils.py", line 228, in create_sqlalchemy_engine_with_retry
engine = create_sqlalchemy_engine(db_uri)
File "/usr/local/lib/python3.10/site-packages/mlflow/store/db/utils.py", line 284, in create_sqlalchemy_engine
return sqlalchemy.create_engine(db_uri, pool_pre_ping=True, **pool_kwargs)
File "<string>", line 2, in create_engine
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
return fn(*args, **kwargs) # type: ignore[no-any-return]
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 601, in create_engine
dbapi = dbapi_meth(**dbapi_args)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 152, in import_dbapi
return __import__("MySQLdb")
ModuleNotFoundError: No module named 'MySQLdb'
Forgive my rudeness, but you can make the image even slimmer by removing the whole MLFlow from the package. These are features that you are removing from the Docker image. Please include them both in the image.
The solution provided by @shantanu-bbai does not work for me since I'm running this Docker image on a TrueNAS Scale (Kubernetes) and I could not find a way to introduce those extra commands to the app.
If anyone's interested in creating their own docker image with MySQL and Postgres client support, here's how its Dockerfile looks like:
FROM ghcr.io/mlflow/mlflow:v2.7.0
RUN apt-get -y update && \
apt-get -y install python3-dev default-libmysqlclient-dev build-essential pkg-config && \
pip install --upgrade pip && \
pip install mysqlclient && \
pip install psycopg2-binary
CMD ["bash"]
But I still strongly believe this should be part of the official image.
Somewhat related to useful packages missing from the base image, the prometheus exporter is also missing:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.10/site-packages/gunicorn/util.py", line 371, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/site-packages/mlflow/server/__init__.py", line 50, in <module>
from mlflow.server.prometheus_exporter import activate_prometheus_exporter
File "/usr/local/lib/python3.10/site-packages/mlflow/server/prometheus_exporter.py", line 2, in <module>
from prometheus_flask_exporter.multiprocess import GunicornInternalPrometheusMetrics
ModuleNotFoundError: No module named 'prometheus_flask_exporter'
And, to workaround, you can of course extend the above Dockerfile with:
pip install prometheus-flask-exporter
If your application supports Postgres, it seems logical that you would include that support all the way to the official Docker image.
This is somewhat a bad idea, and people will begin to use public images that have the support built in, causing everyone to think they are using the official Docker image when they are most definitely not.
Case in point: The community Helm chart for Mlflow. https://github.com/community-charts/helm-charts/blob/main/charts/mlflow/values.yaml
In this case, they're not using the official Mlflow Docker image, they're https://github.com/burakince/mlflow .
This info came from this site: https://artifacthub.io/packages/helm/community-charts/mlflow , which has a link to this official GitHub repo. If you go to LINKS on the right, you'll see official links for Mlflow in the Homepage, Source. This was as of Nov 15th, 2023.
I realize they are "community charts". But I think there is a case to be made that you SHOULD add support to the official Docker image for the databases and services Mlflow uses.
One potential solution: have a slim version of the image that is smaller and doesn't have these modules (or the opposite, full that has everything). Then our deployments are aligned with you as the upstream.
Just some thoughts, hope it is considered. TY for the support.
(Just sharing this info for future reference as an alternative option. The bitnami/mlflow image supports Postgres and S3 by default ☺️)
If anyone's interested in creating their own docker image with MySQL and Postgres client support, here's how its Dockerfile looks like:
FROM ghcr.io/mlflow/mlflow:v2.7.0 RUN apt-get -y update && \ apt-get -y install python3-dev default-libmysqlclient-dev build-essential pkg-config && \ pip install --upgrade pip && \ pip install mysqlclient && \ pip install psycopg2-binary CMD ["bash"]But I still strongly believe this should be part of the official image.
I know this is old, but the psycopg2 maintainers suggest that depending on the psycopg2-binary package in production is discouraged. Even more reason IMO that there be an official MLFlow image that includes this python dep and its necessary system dependencies so the community doesn't need to solve this problem every time it comes up.
Here is a working image until the MLFlow team provides one:
https://hub.docker.com/r/imichael/mlflow
Users turn to Docker precisely to avoid the hassle of installing all the prerequisites. I can't understand why the developers consider this feature unnecessary.
I also agree that the official docker image should support Postgres out-of-the-box.
Another surprised user who just wanted to try MLFlow to see if it's a good fit and was a bit disappointed to instantly have to jump through hoops
And another... We were using custom docker image. Wanted to go back to the original because after 3 years we forgot why we did it....
ModuleNotFoundError: No module named 'psycopg2'
Ok.... Here is the answer...
Just ran into this issue of needing to install psycopg2 manually .... I don't understand why MLFlow supports PostgreSQL integration but the image doesn't actually come ready to actually integrate it.
Is there any second thought on adding it into the main image? Or even having a separate package with it? Or something else?
I also second this requirement of adding the PostreSQL support in the official MLFlow image.
Having the same issue here (who would've thought if hadn't been fixed). If you don't want to include extra dependencies into your image -> create a dedicated image with all dependencies needed to work with postgres out-of-the-box. Your decision not to include it and make user install it itself it at very least silly
I made it work with Postgres and S3 using the followingDockerfile and docker compose file
Dockerfile
FROM ghcr.io/mlflow/mlflow
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir boto3==1.37.11 psycopg2-binary==2.9.10
docker-compose.yml
services:
postgres:
image: postgres:15
container_name: mlflow-postgres
environment:
POSTGRES_USER: mlflow
POSTGRES_PASSWORD: mlflow
POSTGRES_DB: mlflow_db
volumes:
- ./data:/var/lib/postgresql/data
ports:
- "5432:5432"
mlflow:
build:
context: .
dockerfile: ./Dockerfile
container_name: mlflow-server
command: >
mlflow server
--host 0.0.0.0
--port 5000
--backend-store-uri postgresql://mlflow:mlflow@postgres:5432/mlflow_db
--default-artifact-root s3://equation-mlflow
environment:
- AWS_ACCESS_KEY_ID=xxx
- AWS_SECRET_ACCESS_KEY=xxx
- AWS_DEFAULT_REGION=xxx
ports:
- "8080:5000"
depends_on:
- postgres
I didn't need to roll my own Dockerfile... passing this as the command works (I'm starting up the mlflow server in a docker compose):
bash -c "python3 -m pip install pip --upgrade && \ python3 -m pip install psycopg2-binary && \ mlflow server ${WHATEVER_FLAGS_YOU_NEED}"
This is a great solution actually. No need to maintain a custom Docker image. The command line can be made even simpler:
bash -c 'pip install psycopg2-binary && mlflow server ...'
This will take just a second at the startup.