payu
payu copied to clipboard
Saving errors logs from a hanging PBS job
While we appear to be saving error logs for crashed jobs into error_logs
in archive, it seems that I am losing information from hanging jobs which run indefinitely and are eventually killed by the scheduler.
This is presumably because PBS is killing the python process before the model returns SIGTERM or whatever.
We should probably investigate this a little more and also monitor PBS state, if possible. It may not actually be possible to call any code at the Python level after exceeding job time.