aim
aim copied to clipboard
How to delete a run when there is error during the track and before program exit?
❓Question
I write the code like this but when some error occurred I can not use delete_run.
run = Run(
repo=os.path.join(aim_repo),
experiment="mem_predict"
)
try:
...
except:
run_hash = run.hash
run.close()
del run
repo = Repo.from_path(aim_repo)
repo.delete_run(run_hash)
Hey @BangBOOM! Are there any errors/warnings when trying to delete the run? Cause it's working fine on my end. Can I also ask you what aim_repo variable represents? As you've used it once with os.path.join(aim_repo) and then directly.
Thanks for your response, aim_repo is the path to the file where i saved my experment it's value is ~/aim_repo
the error message:
Error while trying to delete run 'd1094018bb39472bb61db938'. The file lock '/home/xxx/aim_repo/.aim/meta/locks/d1094018bb39472bb61db938' could not be acquired..
And Yes I've used os.path.join(aim_repo)
That's strange, run.close() should've released all the locks for the run.
Could you please proved some system/environment information? (aim version, python version, pip version, os)
And are there any parallel processes that maybe trying to open the same run in write mode?
Sure, My system info:
Python 3.9.12
Aim v3.11.2
pip 22.1.2
os info:
-> % cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
besides I use dvc to start the program with the command dvc exp run.
I also encountered this problem, I can't delete any runs using aim runs rm or using web page delete button. How can I use soft file locks to avoid potential data corruption?
hi @sijeh. Could you please provide a little bit more information about the setup you're using?