yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] MLFlow needs to read .dockerignore and .gitignore or have its own .ignore file

Open levkk opened this issue 4 years ago • 7 comments

Willingness to contribute

  • [ ] Yes. I can contribute a fix for this bug independently.
  • [x] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • [ ] No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS X
  • MLflow installed from (source or binary): pip
  • MLflow version (run mlflow --version): 1.12.1
  • Python version: 3.8
  • npm version, if running the dev UI: N/A
  • Exact command to reproduce: mlflow run .

Describe the problem

MLFlow run command copies the entire work directory without taking .dockerignore and .gitignore into account.

This opens it up to copying heavy directories like venv (virtualenv), secrets, environment variables (.e.g stored in .env), etc.

Why the Docker env ignores .dockerignore is an existing issue #2016.

This can be mitigated by adding an ignore=func parameter to shutil.copytree used here. This would need to implement either .dockerignore or .gitignore patterns, or implement its own .mlflowignore file. This would also make MLFlow run faster because it won't need to copy heavy files around (twice with the .dockerignore bug).

Code to reproduce issue

Use any of the example projects. Add any file into the folder and add it to .dockerignore. Build the image and note the file being present in the image.

Other info / logs

N/A

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [x] area/build: Build and test infrastructure for MLflow
  • [ ] area/docs: MLflow documentation pages
  • [x] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: Local serving, model deployment tools, spark UDFs
  • [ ] area/server-infra: MLflow server, JavaScript dev server
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • [ ] area/uiux: Front-end, user experience, JavaScript, plotting
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

Language

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

Integrations

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

levkk avatar Jan 12 '21 18:01 levkk

@levkk Did your commit fix this issue? If so, how is it used?

zimonitrome avatar Feb 15 '22 10:02 zimonitrome

Hey @zimonitrome , I have stopped using MLFlow since then and haven't had a chance to work on this further. The commit I proposed uses Python's shutil ignore parameter which is very similar to how Docker and git implement this functionality in .dockerignore and .gitignore respectively. Sadly, that commit was never merged.

levkk avatar Feb 15 '22 16:02 levkk

Thanks for the follow-up @levkk. Sad that it never got merged :(

May I ask if you migrated to something you enjoy more than MLFlow?

zimonitrome avatar Feb 15 '22 22:02 zimonitrome

Yup, we just started using Docker and Dockerfiles directly and launch the containers using our existing CI/CD. The model registry MLFlow provided was neat though, so we kept using that just for metadata tracking.

levkk avatar Feb 15 '22 23:02 levkk

This would be very useful, +1

mattijsdp avatar Dec 06 '23 13:12 mattijsdp

another +1 for this. It would be very useful.

tobiasbartsch avatar Mar 30 '24 00:03 tobiasbartsch

another +1, would be very useful.

vikingsheadquarter avatar Apr 16 '24 16:04 vikingsheadquarter