yocto-gl [BUG] MLFlow needs to read .dockerignore and .gitignore or have its own .ignore file

Willingness to contribute

[ ] Yes. I can contribute a fix for this bug independently.
[x] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
[ ] No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS X
MLflow installed from (source or binary): pip
MLflow version (run mlflow --version): 1.12.1
Python version: 3.8
npm version, if running the dev UI: N/A
Exact command to reproduce: mlflow run .

Describe the problem

MLFlow run command copies the entire work directory without taking .dockerignore and .gitignore into account.

This opens it up to copying heavy directories like venv (virtualenv), secrets, environment variables (.e.g stored in .env), etc.

Why the Docker env ignores .dockerignore is an existing issue #2016.

This can be mitigated by adding an ignore=func parameter to shutil.copytree used here. This would need to implement either .dockerignore or .gitignore patterns, or implement its own .mlflowignore file. This would also make MLFlow run faster because it won't need to copy heavy files around (twice with the .dockerignore bug).

Code to reproduce issue

Use any of the example projects. Add any file into the folder and add it to .dockerignore. Build the image and note the file being present in the image.

Other info / logs

N/A

What component(s), interfaces, languages, and integrations does this bug affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[x] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[x] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: Local serving, model deployment tools, spark UDFs
[ ] area/server-infra: MLflow server, JavaScript dev server
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

[ ] area/uiux: Front-end, user experience, JavaScript, plotting
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Language

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Jan 12 '21 18:01 levkk

@levkk Did your commit fix this issue? If so, how is it used?

Feb 15 '22 10:02 zimonitrome

Hey @zimonitrome , I have stopped using MLFlow since then and haven't had a chance to work on this further. The commit I proposed uses Python's shutil ignore parameter which is very similar to how Docker and git implement this functionality in .dockerignore and .gitignore respectively. Sadly, that commit was never merged.

Feb 15 '22 16:02 levkk

Thanks for the follow-up @levkk. Sad that it never got merged :(

May I ask if you migrated to something you enjoy more than MLFlow?

Feb 15 '22 22:02 zimonitrome

Yup, we just started using Docker and Dockerfiles directly and launch the containers using our existing CI/CD. The model registry MLFlow provided was neat though, so we kept using that just for metadata tracking.

Feb 15 '22 23:02 levkk

This would be very useful, +1

Dec 06 '23 13:12 mattijsdp

another +1 for this. It would be very useful.

Mar 30 '24 00:03 tobiasbartsch

another +1, would be very useful.

Apr 16 '24 16:04 vikingsheadquarter

yocto-gl yocto-gl copied to clipboard

[BUG] MLFlow needs to read .dockerignore and .gitignore or have its own .ignore file

Willingness to contribute

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

yocto-gl
yocto-gl copied to clipboard