aim
aim copied to clipboard
IO Error: 'too many open files' when removing many corrupted runs
🐛 Bug: Removal of many corrupted runs in one go
I ran a larger experiment tracking a lot of runs and apparently I had quite a few corrupted runs (in my case 539).
I tried removing them by calling aim runs rm --corrupted, but got an error "IO too many open files".
I still could remove single corrupted runs with aim runs rm ${hash}.
I tried increasing the limit with ulimit -n up to 2048, but too no effect
To reproduce
Somehow get a lot of corrupted runs and try to remove them at once with aim runs rm --corrupted
Expected behavior
A removal of runs that respects the limit of open files, so that aim runs rm --corrupted also works, if there are many corrupted runs.
Environment
- Aim v3.24.0
- Python 3.11.6
- pip 24.0
- OS Ubuntu 22.04.4 LTS
Additional context
As a workaround I wrote a short bash-script to remove corrupted runs one by one, but this still quite cumbersome.
#! /bin/bash
aim runs ls --corrupted | head -n 1 | sed 's/\t/\n/g' > corrupted_runs
while read -r run;
do
echo "Removing corrupted run: ${run}"
aim runs rm ${run} -y
done < corrupted_runs