Mathieu Germain issues

Results 20 issues of


                                            Mathieu Germain

Improve CommandManager

Most of this really need to be tought in parallel with the JobManager #91. - [ ] Only give the base path to the command manager and let it handle...

Refactor command line

This will allow us to add feature such as #10, #93 and more. Status of a batch can be (Running, Done, Stopped) Here an Idea of what the option tree...

Create a JobManager in the style of the CommandManager that include probably the functionalities of job_generator/job_generator_factory and some stuff from scripts/smart_dispatch.py. This might be a bigger task than it seems,...

Queue Managament

As eluded to in #86 add the possibility to manage queues. - queue - info (QNAME | All) - add QNAME CORESPERNODE GPUSPERNODE RAMPERNODE MAXWALLTIME DEFAULTMODULES NODESINQ MINPPN - delete...

Change the default behaviour of workers

If the user code does not have built-in resume the current behaviour will be a problem. The default behaviour of the worker should be to run one command and then...

[Launcher] Check if the controller launched properly before starting the workers

We should check if the controller actually launched before starting the workers. Imagine the case you have a controller already running on the same port you are trying to use,...

enhancement

Problem with mini-batches in Controller and fixed nb of mb in Worker before sync.

In the case where the Controller manages the mini-batches but, the Worker decides when to sync with the global parameters, you can encounter the problem where the Worker is waiting...

bug

[Launcher] Possible collision in workers output files.

If you simply use `gpu` multiple time when using the launcher `platoon-launcher exp gpu gpu gpu` all the worker will try to output in the same file.

bug

[Controller] The mini-batches process dies before the buffer is emptied.

In the controller, the process sending mini-batches terminate as soon as it is done sending batches, destroying the buffer in the process and dropping the last few mini-batches. I tried...

bug

Platoon doesn't work on Windows

It's probably not a real issue but, Platoon will not work on Windows because we are using `posix_ipc` which is not compatible and I think the way we use `cffi`...

wontfix