dbx icon indicating copy to clipboard operation
dbx copied to clipboard

[Feature] Provide possibility to disable MLflow integration

Open matthiaspfenninger opened this issue 3 years ago • 4 comments

Expected Behavior

When using DBX in a non-ML context, it would be nice if there is a way for users to deactivate logging experiments & artifacts to MLflow, e.g. via some parameter in the deployment config, or some additional boolean CLI parameter.

Current Behavior

DBX is logging experiments and uploading artifacts via MLflow. The user can only choose the location of the experiments & artifacts, but not deactivate the MLflow integration.

Context

MLflow is designed to facilitate the workflow in ML project contexts, but Databricks is used for a lot of non-ML workloads as well. It would be nice to not tightly couple the use of DBX with MLflow, some reasons being e.g.

  • If a user is using DBX in a non-ML project context, the MLflow workspace can quickly become convoluted with unwanted experiments, and DBFS space is with taken up with unwanted artifacts
  • If the usage of MLflow is not allowed due to company-internal policies, a user cannot use DBX at all at the moment

Environment

  • dbx version used: 0.4.1
  • Databricks Runtime version: 10.4 LTS

matthiaspfenninger avatar Apr 05 '22 13:04 matthiaspfenninger

Do you have a suggested alternative? My understanding is that MLFlow is used to provide good versioning semantics across deployments. So if we allow users to not use it (which might make sense), what alternative provides those versioning semantics?

skylarbpayne avatar Apr 05 '22 16:04 skylarbpayne

From my experience, when you don't want MLflow, the versioning usually happens on another level. It can e.g. be the case that a Databricks job is just one piece of a larger pipeline, with ADF on top orchestrating it and other microservices around being part of the pipeline as well. Versioning of deployments is then taking into account all of these parts, and is not needed to be done directly by DBX only for the Databricks job.

So this might be biased based on my previous experiences, but: I would not enforce the versioning with a certain technology on dbx level. Providing the optional possibility to do so with MLflow definitely makes sense as this is part of the Databricks ecosystem, and fits well with all ML use cases. I can't propose a good alternative for other cases though as I don't see some commonly used standard there.

Very happy to hear other opinions and experiences regarding this.

matthiaspfenninger avatar Apr 05 '22 16:04 matthiaspfenninger

hi @matthiaspfenninger , we have a plan to add a non-mlflow deployment capability (via direct upload to dbfs). Stay tuned for implementation updates.

renardeinside avatar Jul 20 '22 22:07 renardeinside

Are there any updates about this?

giovannipapini-agilelab avatar Mar 30 '23 17:03 giovannipapini-agilelab