automlbenchmark
automlbenchmark copied to clipboard
results.csv error -- no columns to parse from file
Hello, In our usage mode of the benchmark, we usually launch all tasks from the same directory in a mega cluster at a given moment in time.
My understanding is that all run results are integrated into the file results/results.csv.
I think that in our unlucky scenario, a couple of runs finished at the same exact time and tried to access the results.csv at the same time.
Is there some code control to prevent this from happening? Although it is an unlucky event, we are trying some 8 hour runs, so any failure is painful.
Any suggestions you could provide or is there some lock support planned to be added on this file?
Hi @franchuterivera, no i don't plan to add lock support for this, and here is why.
The global "results/results.csv" is just provided for convenience, as it is aggregating the results from its subfolders.
Each subfolder corresponds to a benchmark execution, ie. a call to python runbenchmark.py ...
, where you can also find a scores/results.csv
file which aggregates results for this specific execution. This aggregation is safe as it is accessed only by one process: even in aws mode, there is a dedicated thread+queue in charge of writing to this file.
So if I understand your use-case correctly, you have multiple calls to python runbenchmark.py ...
running in parallel, each one with write access to this global results/results.csv
.
In that case, I would just recommend to rely only on the scores/results.csv
and concatenate them ex post.
Note that if you don't even trust the scores/results.csv
aggregations, you can always aggregate all the output/scores/results.csv
from each individual run (when using parallel runs in aws mode typically): it's very easy to do with pandas
.
of course, if you think that sth like filelock
could be useful, feel free to propose a change, your contributions are always appreciated.
I personally mainly rely on the scores/results.csv
files, so I just don't see the need for it.
@franchuterivera can I close this issue as solved?
Hello, yeah I am using the suggestion from @sebhrusen so I will close it. Thanks for the help!
Reopening this, this seems to occur much more often that I thought so will try to add a fil lock on that aggregate results.csv
file.
Can confirm I ran into this (at least the symptoms are identical) while trying to distribute the workload over multiple AWS regions with multiple runbenchmark
calls.