yocto-gl
yocto-gl copied to clipboard
[BUG] mlflow gc do not remove deleted experiments when runs were tracked with datasets
Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the issues policy
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
- Client: 2.9.2
- Tracking server: 2.9.2
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.3 LTS
- Python version: Python 3.10.12
- yarn version, if running the dev UI: -
Describe the problem
"mlflow gc" command do not remove deleted experiments when runs were tracked with datasets
Tracking information
It was a default setup from "Remote Experiment Tracking with MLflow Tracking Server" scenario: https://mlflow.org/docs/latest/tracking/tutorials/remote-server.html
- When created, I have run a sample code (attached in below section - "Code to reproduce issue") to generate a "test" experiment.
- Then, I have deleted it with the web UI.
- After that I have run below commands:
export MLFLOW_TRACKING_URI="http://localhost:5000"
mlflow gc --backend-store-uri="postgresql://user:password@localhost:5432/mlflowdb"
Code to reproduce issue
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("test")
mlflow.autolog()
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)
# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
Stack trace
mlflow gc --backend-store-uri="postgresql://user:password@localhost:5432/mlflowdb"
Run with ID 6fb6bb680a2249388ec8346febc6540c has been permanently deleted.
Traceback (most recent call last):
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.ForeignKeyViolation: update or delete on table "experiments" violates foreign key constraint "datasets_experiment_id_fkey" on table "datasets"
DETAIL: Key (experiment_id)=(1) is still referenced from table "datasets".
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/mlflow/store/db/utils.py", line 142, in make_managed_session
session.commit()
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1969, in commit
trans.commit(_to_root=True)
File "<string>", line 2, in commit
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
ret_value = fn(self, *arg, **kw)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1256, in commit
self._prepare_impl()
File "<string>", line 2, in _prepare_impl
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
ret_value = fn(self, *arg, **kw)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1231, in _prepare_impl
self.session.flush()
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4312, in flush
self._flush(objects)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4447, in _flush
with util.safe_reraise():
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 4408, in _flush
flush_context.execute()
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
rec.execute(self)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/unitofwork.py", line 679, in execute
util.preloaded.orm_persistence.delete_obj(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 191, in delete_obj
_emit_delete_statements(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1456, in _emit_delete_statements
c = connection.execute(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
return meth(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
return connection._execute_clauseelement(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
ret = self._execute_context(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1848, in _execute_context
return self._exec_single_context(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1988, in _exec_single_context
self._handle_dbapi_exception(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1969, in _exec_single_context
self.dialect.do_execute(
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 922, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) update or delete on table "experiments" violates foreign key constraint "datasets_experiment_id_fkey" on table "datasets"
DETAIL: Key (experiment_id)=(1) is still referenced from table "datasets".
[SQL: DELETE FROM experiments WHERE experiments.experiment_id = %(experiment_id)s]
[parameters: {'experiment_id': 1}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mlokos/dev/mlflow/venv/bin/mlflow", line 8, in <module>
sys.exit(cli())
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/mlflow/cli.py", line 630, in gc
backend_store._hard_delete_experiment(experiment_id)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/mlflow/store/tracking/sqlalchemy_store.py", line 421, in _hard_delete_experiment
with self.ManagedSessionMaker() as session:
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/home/mlokos/dev/mlflow/venv/lib/python3.10/site-packages/mlflow/store/db/utils.py", line 155, in make_managed_session
raise MlflowException(message=e, error_code=BAD_REQUEST)
mlflow.exceptions.MlflowException: (psycopg2.errors.ForeignKeyViolation) update or delete on table "experiments" violates foreign key constraint "datasets_experiment_id_fkey" on table "datasets"
DETAIL: Key (experiment_id)=(1) is still referenced from table "datasets".
[SQL: DELETE FROM experiments WHERE experiments.experiment_id = %(experiment_id)s]
[parameters: {'experiment_id': 1}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
Other info / logs
The issue is solely connected with datasets-table-handling. I was able to make "mlflow gc" command work, but had to manually remove rows form that table.
psql -d mlflowdb -U user --password -h localhost -p 5432
truncate TABLE datasets; # inside psql cli
After that the "mlflow gc" command was able to remove data from PostgreSQL and artifacts from minIO.
export MLFLOW_TRACKING_URI="http://localhost:5000"
mlflow gc --backend-store-uri="postgresql://user:password@localhost:5432/mlflowdb"
Experiment with ID 1 has been permanently deleted.
What component(s) does this bug affect?
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [X]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [X]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [X]
area/server-infra
: MLflow Tracking server backend - [X]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [X]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
Thanks for the report! I can reproduce this error, will investigate and get back to you!
Hello folks. Same bug was reproducted today, on the same remote tracking server setup, but with a MySQL database and GCS bucket storage.
A possible workaround I implemented on my system this morning is to flush partially the "datasets" table, before calling mlflow gc
, with the following SQL command:
DELETE datasets FROM datasets JOIN experiments ON datasets.experiment_id = experiments.experiment_id WHERE experiments.lifecycle_stage = "deleted";
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
Can we reopen the issue, because the MR is not merged?
@BenWilson2 seems like this wasn't merged yet. Could you please reopen the issue?
Please reopen this.
any updates?