hyperqueue icon indicating copy to clipboard operation
hyperqueue copied to clipboard

Improve error message when binary is not found

Open Kobzol opened this issue 1 year ago • 4 comments

When you start a program that does not exist, HQ shows the error in hq job info last, but nothing is included in the stdout/stderr files of the tasks. Maybe we could add this error also to stderr, to make it easier to figure out what went wrong.

Kobzol avatar Oct 29 '24 10:10 Kobzol

I do not think this is a good idea. HQ commands should be a ground truth for task status. Stdout/Stderr is something produced by the task and we should not interfere with this.

spirali avatar Oct 29 '24 11:10 spirali

We could give a very explicit annotation that the content was generated by HQ, or create a separate file on disk with this error. Sometimes people just take a look at stdout/stderr and expect to see everything there (which is mostly how PBS/Slurm works). In HQ, you also need to examine the job status to see the details.

Kobzol avatar Oct 29 '24 11:10 Kobzol

There are still errors that cannot be solved like this, e.g. task fails because its dependency fail. Or task fails because worker cannot create a stderr because of permissions. So it cannot be universal, and having semantics "sometimes you will find error from HQ in stderr and sometimes not, is worse than the current situation.

spirali avatar Oct 29 '24 11:10 spirali

When thinking more about it. If we promise output solely for this particular error, it should be ok.

spirali avatar Oct 29 '24 11:10 spirali