dbx [Feature] Provide possibility to disable MLflow integration

Expected Behavior

When using DBX in a non-ML context, it would be nice if there is a way for users to deactivate logging experiments & artifacts to MLflow, e.g. via some parameter in the deployment config, or some additional boolean CLI parameter.

Current Behavior

DBX is logging experiments and uploading artifacts via MLflow. The user can only choose the location of the experiments & artifacts, but not deactivate the MLflow integration.

Context

MLflow is designed to facilitate the workflow in ML project contexts, but Databricks is used for a lot of non-ML workloads as well. It would be nice to not tightly couple the use of DBX with MLflow, some reasons being e.g.

If a user is using DBX in a non-ML project context, the MLflow workspace can quickly become convoluted with unwanted experiments, and DBFS space is with taken up with unwanted artifacts
If the usage of MLflow is not allowed due to company-internal policies, a user cannot use DBX at all at the moment

Environment

dbx version used: 0.4.1
Databricks Runtime version: 10.4 LTS

Apr 05 '22 13:04 matthiaspfenninger

Do you have a suggested alternative? My understanding is that MLFlow is used to provide good versioning semantics across deployments. So if we allow users to not use it (which might make sense), what alternative provides those versioning semantics?

Apr 05 '22 16:04 skylarbpayne

From my experience, when you don't want MLflow, the versioning usually happens on another level. It can e.g. be the case that a Databricks job is just one piece of a larger pipeline, with ADF on top orchestrating it and other microservices around being part of the pipeline as well. Versioning of deployments is then taking into account all of these parts, and is not needed to be done directly by DBX only for the Databricks job.

So this might be biased based on my previous experiences, but: I would not enforce the versioning with a certain technology on dbx level. Providing the optional possibility to do so with MLflow definitely makes sense as this is part of the Databricks ecosystem, and fits well with all ML use cases. I can't propose a good alternative for other cases though as I don't see some commonly used standard there.

Very happy to hear other opinions and experiences regarding this.

Apr 05 '22 16:04 matthiaspfenninger

hi @matthiaspfenninger , we have a plan to add a non-mlflow deployment capability (via direct upload to dbfs). Stay tuned for implementation updates.

Jul 20 '22 22:07 renardeinside

Are there any updates about this?

Mar 30 '23 17:03 giovannipapini-agilelab

dbx dbx copied to clipboard

[Feature] Provide possibility to disable MLflow integration

Expected Behavior

Current Behavior

Context

Environment

dbx
dbx copied to clipboard