Ada Böhm
Ada Böhm
We want to support "everything in hq", but so far, you are right: upto 1-node tasks to HQ and multi-node tasks directly into SLURM/PBS
Second thoughs: 7. What about same naming scheme as "hq worker"? hq auto-alloc info => hq auto-alloc list hq auto-alloc allocations X => hq auto-alloc info X 8. What about...
Just noting obvious: it should be technically min(remaining_worker_time, task_time_limit).
Btw: Maybe we can named it "HQ_REMAINING_SECONDS" to make it clear what are the units.
Yes (but as a requirement comes from HQ, I put it here)
Launcher of tasks was moved into HQ, so it is now issue for HQ
It is also connected to #66, that a each task should be spawned into a process group and when it is canceled we should clean all processes
I am not able to find any information if we can distinguish a lost of worker because of preemption from a crash. But I would guess that it has to...
For completeness: you can start worker as follows: ``hq worker start --on-server-lost=finish-running`` and it will finish currently running jobs when server is lost. But reporting of these jobs is lost.
It is done slightly on purpose. Can you share your use case? It cannot be switched off in the current version. But is like 3 lines of code to add...