automlbenchmark icon indicating copy to clipboard operation
automlbenchmark copied to clipboard

results.csv error -- no columns to parse from file

Open franchuterivera opened this issue 4 years ago • 6 comments

Hello, In our usage mode of the benchmark, we usually launch all tasks from the same directory in a mega cluster at a given moment in time.

My understanding is that all run results are integrated into the file results/results.csv.

I think that in our unlucky scenario, a couple of runs finished at the same exact time and tried to access the results.csv at the same time.

Is there some code control to prevent this from happening? Although it is an unlucky event, we are trying some 8 hour runs, so any failure is painful.

Any suggestions you could provide or is there some lock support planned to be added on this file?

franchuterivera avatar May 27 '20 15:05 franchuterivera

Hi @franchuterivera, no i don't plan to add lock support for this, and here is why.

The global "results/results.csv" is just provided for convenience, as it is aggregating the results from its subfolders. Each subfolder corresponds to a benchmark execution, ie. a call to python runbenchmark.py ..., where you can also find a scores/results.csv file which aggregates results for this specific execution. This aggregation is safe as it is accessed only by one process: even in aws mode, there is a dedicated thread+queue in charge of writing to this file.

So if I understand your use-case correctly, you have multiple calls to python runbenchmark.py ... running in parallel, each one with write access to this global results/results.csv. In that case, I would just recommend to rely only on the scores/results.csv and concatenate them ex post.

Note that if you don't even trust the scores/results.csv aggregations, you can always aggregate all the output/scores/results.csv from each individual run (when using parallel runs in aws mode typically): it's very easy to do with pandas.

sebhrusen avatar May 27 '20 21:05 sebhrusen

of course, if you think that sth like filelock could be useful, feel free to propose a change, your contributions are always appreciated. I personally mainly rely on the scores/results.csv files, so I just don't see the need for it.

sebhrusen avatar May 27 '20 21:05 sebhrusen

@franchuterivera can I close this issue as solved?

PGijsbers avatar Oct 06 '20 15:10 PGijsbers

Hello, yeah I am using the suggestion from @sebhrusen so I will close it. Thanks for the help!

franchuterivera avatar Oct 09 '20 16:10 franchuterivera

Reopening this, this seems to occur much more often that I thought so will try to add a fil lock on that aggregate results.csv file.

sebhrusen avatar Mar 15 '21 17:03 sebhrusen

Can confirm I ran into this (at least the symptoms are identical) while trying to distribute the workload over multiple AWS regions with multiple runbenchmark calls.

PGijsbers avatar Nov 12 '21 19:11 PGijsbers