dbx
dbx copied to clipboard
[Feature] Provide possibility to disable MLflow integration
Expected Behavior
When using DBX in a non-ML context, it would be nice if there is a way for users to deactivate logging experiments & artifacts to MLflow, e.g. via some parameter in the deployment config, or some additional boolean CLI parameter.
Current Behavior
DBX is logging experiments and uploading artifacts via MLflow. The user can only choose the location of the experiments & artifacts, but not deactivate the MLflow integration.
Context
MLflow is designed to facilitate the workflow in ML project contexts, but Databricks is used for a lot of non-ML workloads as well. It would be nice to not tightly couple the use of DBX with MLflow, some reasons being e.g.
- If a user is using DBX in a non-ML project context, the MLflow workspace can quickly become convoluted with unwanted experiments, and DBFS space is with taken up with unwanted artifacts
- If the usage of MLflow is not allowed due to company-internal policies, a user cannot use DBX at all at the moment
Environment
- dbx version used: 0.4.1
- Databricks Runtime version: 10.4 LTS
Do you have a suggested alternative? My understanding is that MLFlow is used to provide good versioning semantics across deployments. So if we allow users to not use it (which might make sense), what alternative provides those versioning semantics?
From my experience, when you don't want MLflow, the versioning usually happens on another level. It can e.g. be the case that a Databricks job is just one piece of a larger pipeline, with ADF on top orchestrating it and other microservices around being part of the pipeline as well. Versioning of deployments is then taking into account all of these parts, and is not needed to be done directly by DBX only for the Databricks job.
So this might be biased based on my previous experiences, but: I would not enforce the versioning with a certain technology on dbx level. Providing the optional possibility to do so with MLflow definitely makes sense as this is part of the Databricks ecosystem, and fits well with all ML use cases. I can't propose a good alternative for other cases though as I don't see some commonly used standard there.
Very happy to hear other opinions and experiences regarding this.
hi @matthiaspfenninger , we have a plan to add a non-mlflow deployment capability (via direct upload to dbfs). Stay tuned for implementation updates.
Are there any updates about this?