cylc-flow
cylc-flow copied to clipboard
Log retrieval issues
Several users have reported issues associated with log retrieval (using Cylc 7.8.3 and previous versions). These relate to our HPC which uses PBS. We configure:
retrieve job logs retry delays = PT10S, PT30S, PT3M
This is to deal with the fact that there can be a considerable delay before the job log files (out
and err
) appear in the log directory.
The issues are as follows:
-
Files with the wrong permissions (600) or missing. The example I saw had the wrong permissions on the
out
file and theerr
file was missing, My guess is that this can happen if PBS is part way through writing the log files when the retrieval starts. I think we use existence of theout
file to determine if the retrieval can start. We need to try to get evidence to confirm if this is the cause and investigate whether there is a better method to confirm the logs are ready. -
Other missing log files. It is not clear whether issue 1 accounts for all the reports of missing log files. Another possibility is that the file is too big (we set
retrieve job logs max size = 32M
). It would help if could record in thejob-activity.log
if this happens (probably not easy since I think it's implemented via an rsync option?). -
Log files not available from the GUI. This problem happens when a task fails and the user tries to access the
out
orerr
files but finds them unavailable and has to retry several times. Presumably, once the task fails, the GUI expects to find the log files locally rather than accessing the remote system. Ideally the GUI would continue to access the log files remotely until the log file retrieval has completed.