dagster
dagster copied to clipboard
MLflow hook updates MLflow run state based on all op events, not the just the final op
Dagster version
1.7.4
What's the issue?
I have a job that has multiple ops that runs multiple ops in serial. On that job I have the end_mlflow_on_run_finished hook set up.
When the job runs, the early ops succeed and the mlflow integration marks the run as finished before the job finishes. For some runs, the later ops then fail and then update the mlflow state.
Jobs still running but the mlflow runs are marked as finished:
Jobs have finished with failure and the mlflow runs are marked as failed.
I believe the issues is here: https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-mlflow/dagster_mlflow/hooks.py#L25
What did you expect to happen?
MLflow run status not updated until the job finishes
How to reproduce?
- Job with multiple ops in it
- The job has the end_mlflow_on_run_finished hook
- ENV is set up to talk to MLflow instance
- Run the job and watch mlflow & dagster UIs
Deployment type
Local
Deployment details
We're using MLflow in Databricks
Using dagster-mlflow version: 0.23.4
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.