automlbenchmark icon indicating copy to clipboard operation
automlbenchmark copied to clipboard

Improve error logging when a subprocess fails

Open PGijsbers opened this issue 3 years ago • 2 comments

For autoxgboost there are several reported errors, e.g. results.csv: openml.org/t/359938 Brazilian_houses 2 autoxgboost 1h8c_gp3 neg_rmse 2389.9 1688457899 CalledProcessError: Command 'Rscript --vanilla -e ".libPaths('/bench/frameworks/autoxgboost/lib'); source('/bench/frameworks/autoxgboost/exec.R'); run('/input/org/openml/www/datasets/42688/dataset_tr…

in the log we read:

Timing stopped at: 1.664e+04 64.02 2386
Execution halted
Warning message:
system call failed: Cannot allocate memory

[ERROR] [amlb.benchmark:12:19:12.052] Command 'Rscript --vanilla -e ".libPaths('/bench/frameworks/autoxgboost/lib'); source('/bench/frameworks/autoxgboost/exec.R'); run('/input/org/openml/www/datasets/42688/dat
aset_train_2.arff', '/input/org/openml/www/datasets/42688/dataset_test_2.arff', target.index = 13, 'regression', '/output/predictions/Brazilian_houses/2/predictions.csv', 8, time.budget = 3600, meta_results_fil
e='/output/meta_results.csv')"' returned non-zero exit status 1.

While I don't expect to be able to store the relevant error message in the subprocess (the first few lines), at least making sure the exit status code is also stored in the info field in results and not truncated, would be helpful to identify which errors occurred without requiring an additional dive into the logs.

PGijsbers avatar Dec 06 '21 17:12 PGijsbers

We can definitely improve the error message with R frameworks.

Note that the run_cmd function raises a subprocess.CalledProcessError if the subprocess failed, this error object has an stderr property that should provide us direct access to the error message raised by the R process. Maybe we could just "tail" it? last 5-10 lines? and use this to raise a more meaningful error?

Regarding the truncated message, this is currently configured to 200 chars:

results:                 # configuration namespace for the results.csv file.
  error_max_length: 200  # the max length of the error message as rendered in the results file.

Just wanted to avoid printing silly things there as we don't control those messages fully, but I guess it could be increased.

sebhrusen avatar Dec 07 '21 20:12 sebhrusen

Thanks for the pointer. I do agree though that just raising the limit itself isn't generally going to be less friendly. Tailing is probably good, maybe we can use the logs to determine a good tail length (hopefully just 1 or 2 lines, otherwise we probably generated too long messages again). Additionally I think we should re-order the error message so that it starts with the exit code.

PGijsbers avatar Dec 08 '21 10:12 PGijsbers