aim icon indicating copy to clipboard operation
aim copied to clipboard

How to delete a run when there is error during the track and before program exit?

Open BangBOOM opened this issue 3 years ago • 6 comments
trafficstars

❓Question

I write the code like this but when some error occurred I can not use delete_run.

run = Run(
    repo=os.path.join(aim_repo),
    experiment="mem_predict"
)

try:
 ...
except:
    run_hash = run.hash
    run.close()
    del run
    repo = Repo.from_path(aim_repo)
    repo.delete_run(run_hash)

BangBOOM avatar Jul 19 '22 09:07 BangBOOM

Hey @BangBOOM! Are there any errors/warnings when trying to delete the run? Cause it's working fine on my end. Can I also ask you what aim_repo variable represents? As you've used it once with os.path.join(aim_repo) and then directly.

mihran113 avatar Jul 19 '22 13:07 mihran113

Thanks for your response, aim_repo is the path to the file where i saved my experment it's value is ~/aim_repo the error message:

Error while trying to delete run 'd1094018bb39472bb61db938'. The file lock '/home/xxx/aim_repo/.aim/meta/locks/d1094018bb39472bb61db938' could not be acquired..

And Yes I've used os.path.join(aim_repo)

BangBOOM avatar Jul 19 '22 15:07 BangBOOM

That's strange, run.close() should've released all the locks for the run. Could you please proved some system/environment information? (aim version, python version, pip version, os) And are there any parallel processes that maybe trying to open the same run in write mode?

mihran113 avatar Jul 19 '22 16:07 mihran113

Sure, My system info:

Python 3.9.12
Aim v3.11.2
pip 22.1.2
os info:
-> % cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"

besides I use dvc to start the program with the command dvc exp run.

BangBOOM avatar Jul 20 '22 01:07 BangBOOM

I also encountered this problem, I can't delete any runs using aim runs rm or using web page delete button. How can I use soft file locks to avoid potential data corruption?

sijeh avatar Aug 17 '22 07:08 sijeh

hi @sijeh. Could you please provide a little bit more information about the setup you're using?

mihran113 avatar Aug 18 '22 17:08 mihran113