pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[feature] How can I delete a run completely, including the records in the cachedb and metadb mysql database?

Open Shuai-Xie opened this issue 3 years ago • 9 comments

Feature Area

/area backend

Currently, I can use the kfp Python SDK to delete runs by following codes.

from kfp_server_api.api.run_service_api import RunServiceApi

run_api = RunServiceApi()

def list_all_runs(sort_by='name'):
    list_runs_rsp = run_api.list_runs(sort_by=sort_by)
    runs = list_runs_rsp.runs

    if runs is None:
        print('no runs')
    else:
        print('list runs')
        for run in runs:
            print(run.id, run.name)

    return runs


def delete_all_runs(runs):
    if runs is not None:
        print('delete runs')
        for run in runs:
            print('delete', run.name)
            run_api.delete_run(id=run.id)


if __name__ == '__main__':
    runs = list_all_runs()
    delete_all_runs(runs)
    list_all_runs()

However, I find there are still many artifacts not cleared even if I have deleted all the runs, which may have an impact on the runs submitted later if the run names are the same. Supposing we have two submitted runs both named add_two_numbers. The component states of the latter run may use the states of the earlier one. I mean the component state of the latter run may be Succeed when the run just starts as it uses the state of the earlier one.

image

A great number of cache and meta records are not cleared in mysql database.

# exec into the kubeflow mysql pod, and run the following queries.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cachedb            |			# new
| metadb             |			# new
| mlpipeline         |			# new
| mysql              |
| performance_schema |
| sys                |
+--------------------+
7 rows in set (0.00 sec)

# cache db

mysql> show tables;
+-------------------+
| Tables_in_cachedb |
+-------------------+
| execution_caches  |
+-------------------+
1 row in set (0.00 sec)

mysql> select count(*) from execution_caches;
+----------+
| count(*) |
+----------+
|      204 |		# many cache records
+----------+
1 row in set (0.00 sec)

# meta db

mysql> show tables;
+-------------------+
| Tables_in_metadb  |
+-------------------+
| Artifact          |
| ArtifactProperty  |
| Association       |
| Attribution       |
| Context           |
| ContextProperty   |
| Event             |
| EventPath         |
| Execution         |
| ExecutionProperty |
| MLMDEnv           |
| ParentContext     |
| ParentType        |
| Type              |
| TypeProperty      |
+-------------------+
15 rows in set (0.01 sec)

mysql> select count(*) from Artifact;
+----------+
| count(*) |
+----------+
|      395 |		# many meta records
+----------+
1 row in set (0.01 sec)

What feature would you like to see?

If I delete a run, the records in the cachedb and metadb mysql database should be deleted, too.

What is the use case or pain point?

Pain Point: Supposing we have two submitted runs both named add_two_numbers. The component states of the latter run may use the states of the earlier one, which may make the latter run not execute normally.

Is there a workaround currently?

Maybe directly delete the records in mysql by myself.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

Shuai-Xie avatar Jun 03 '21 04:06 Shuai-Xie

I guess the problem may arise from the mechanism of Argo. Because kfp will add suffix for runs with the same name by default. By the way, is the suffix generated uniquely by taking the previously generated suffixes into consideration?

I submit 4 runs with the same RUN_NAME as add_two_numbers. The kfp will append a unique suffix for each run.

image

code

import kfp
import kfp.dsl as dsl
from kfp.components import func_to_container_op
from typing import NamedTuple

EXP_NAME = 'demo'
RUN_NAME = 'add_two_numbers'


@func_to_container_op
def add(a: int, b: int) -> NamedTuple('output', [('func_name', str), ('result', int)]):
    from collections import namedtuple
    output = namedtuple('output', ['func_name', 'result'])
    return output('add_two_numbers', a + b)


@func_to_container_op
def show_result(func_name, result):
    print(f'{func_name} result: {result}')


@dsl.pipeline(name=RUN_NAME)
def pipeline_func(a: int, b: int):
    add_task = add(a, b)
    show_result(add_task.outputs['func_name'], add_task.outputs['result'])


if __name__ == '__main__':
    client = kfp.Client(host='my_k8s_master_ip:8888', namespace='kubeflow')
    client.create_run_from_pipeline_func(pipeline_func,
                                         arguments={
                                             'a': 1,
                                             'b': 100
                                         },
                                         experiment_name=EXP_NAME,
                                         run_name=RUN_NAME)
    print('submit pipeline in pod')

Shuai-Xie avatar Jun 03 '21 04:06 Shuai-Xie

I'm too unable to clear all the artifacts and executions for deleted runs.

RakeshRaj97 avatar Jun 07 '21 07:06 RakeshRaj97

We plan to implement deleting cachedb when deleting a run record in V2 compatible mode. However, currently MLMD doesn't support deleting metadb in mysql database, you can file issue to MLMD repo.

zijianjoy avatar Jun 11 '21 00:06 zijianjoy

https://github.com/google/ml-metadata

zijianjoy avatar Jun 11 '21 00:06 zijianjoy

Cleaning up MLMD entries requires upstream feature request: https://github.com/google/ml-metadata/issues/69

Bobgy avatar Jul 06 '21 03:07 Bobgy

reassigned it to @chensun

capri-xiyue avatar Jan 19 '22 00:01 capri-xiyue

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 19 '22 07:04 stale[bot]

@stale this is still important

juliusvonkohout avatar Jun 24 '22 09:06 juliusvonkohout

@Shuai-Xie What are you passing to RunServiceApi() to correctly authenticate to your kfp instance?

I'm currently using the following but keep getting "None" as a response from list_runs

run_api = RunServiceApi(kfp_server_api.ApiClient(kfp_server_api.Configuration(
    host = "https://<my_kfp_hostname>"
)))

Any idea what's going on here?

connor-swanson avatar Aug 03 '22 23:08 connor-swanson

Closing this issue. No activity for more than a year.

/close

rimolive avatar Apr 09 '24 12:04 rimolive

@rimolive: Closing this issue.

In response to this:

Closing this issue. No activity for more than a year.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Apr 09 '24 12:04 google-oss-prow[bot]