smartdispatch
smartdispatch copied to clipboard
An easy to use job launcher for supercomputers with PBS compatible job manager.
Maybe we could change `DD:HH:MM:SS` to `[[[DD:]HH:]MM:]SS` in the help message. We could think also about a way to make it more user friendly by allowing one to specify the...
Upon execution, smartdispatch should create that folder if it does not exist and then copy the queue file of the appropriate cluster so the user can modify it at will....
Right now we are adding the running command back to `pending` list unconditionally. This may not be desirable in some cases, e.g. when the termination triggers checkpointing which can potentially...
Attached, an example file from a worker that had hard time... Further details to come tomorrow... @mgermain [770146_mp2_m_worker_105_e.txt](https://github.com/SMART-Lab/smartdispatch/files/222475/770146_mp2_m_worker_105_e.txt)
Check the proper way of doing unit tests in Python and refactor accordingly. Also add a mocking dependency and use it to test the yes_no_prompt and the launching of job.
Add support for -M {gigPerNode}, -m {gigPerCommand} and make sure that this is properly handled in the PBS file.
For example, let say we launch a job on colosse and smartdispatch does not add the #PBS -A ycy-xxx-xx then msub will warn that the config is not proper but...
There is `qpeek` command that allows you to stream what is going on with a job. I think it is a very partial solution for monitoring larger batch of long-running...
The logs of my jobs are missing some lines (I print one line every batch, approx 6 out of 7 are missing!) and sometimes the lines are scrambled (the outputs...
It would be great to move the completed logs, possibly into a "completed" dir and a "failed" dir. This could be an easy temporary/first step to go in the direction...