automlbenchmark
automlbenchmark copied to clipboard
Improve error logging when a subprocess fails
For autoxgboost
there are several reported errors, e.g. results.csv
:
openml.org/t/359938 Brazilian_houses 2 autoxgboost 1h8c_gp3 neg_rmse 2389.9 1688457899 CalledProcessError: Command 'Rscript --vanilla -e ".libPaths('/bench/frameworks/autoxgboost/lib'); source('/bench/frameworks/autoxgboost/exec.R'); run('/input/org/openml/www/datasets/42688/dataset_tr…
in the log we read:
Timing stopped at: 1.664e+04 64.02 2386
Execution halted
Warning message:
system call failed: Cannot allocate memory
[ERROR] [amlb.benchmark:12:19:12.052] Command 'Rscript --vanilla -e ".libPaths('/bench/frameworks/autoxgboost/lib'); source('/bench/frameworks/autoxgboost/exec.R'); run('/input/org/openml/www/datasets/42688/dat
aset_train_2.arff', '/input/org/openml/www/datasets/42688/dataset_test_2.arff', target.index = 13, 'regression', '/output/predictions/Brazilian_houses/2/predictions.csv', 8, time.budget = 3600, meta_results_fil
e='/output/meta_results.csv')"' returned non-zero exit status 1.
While I don't expect to be able to store the relevant error message in the subprocess (the first few lines), at least making sure the exit status code is also stored in the info
field in results and not truncated, would be helpful to identify which errors occurred without requiring an additional dive into the logs.
We can definitely improve the error message with R frameworks.
Note that the run_cmd
function raises a subprocess.CalledProcessError
if the subprocess failed, this error object has an stderr
property that should provide us direct access to the error message raised by the R process. Maybe we could just "tail" it? last 5-10 lines? and use this to raise a more meaningful error?
Regarding the truncated message, this is currently configured to 200
chars:
results: # configuration namespace for the results.csv file.
error_max_length: 200 # the max length of the error message as rendered in the results file.
Just wanted to avoid printing silly things there as we don't control those messages fully, but I guess it could be increased.
Thanks for the pointer. I do agree though that just raising the limit itself isn't generally going to be less friendly. Tail
ing is probably good, maybe we can use the logs to determine a good tail
length (hopefully just 1 or 2 lines, otherwise we probably generated too long messages again). Additionally I think we should re-order the error message so that it starts with the exit code.