Question : dynamical job memory allocation?
We use mainly slurm as scheduler on our HPC cluster. Unfortunately we have jobs that intermittently encounters a memory allocation spike, leading to an "OUT_OF_MEMORY" error and resulting in the job being terminated. With SLURM, the total memory ressource allocated to a job should be defined beforehand and the only way to tackle this porblem is to increase to total amount of memory, leading to waste of memory ressources on the cluster. I am wondering if one can solve this problem elegantly with the flux framework, by introducing a kind of memory pool which will increase only if needed ?
If I understand correctly, you are proposing an evolving job that would request a base amount of memory per task at job submission, but could grow at runtime if memory needs are expected to increase?
Conceptually, Flux does support grow and shrink, but unfortunately this is not practically supported at this time. In fact, production Flux does not yet support memory-as-resource, though the Fluxion scheduler does. So we have a ways yet to go to support this use case.
In order for this to work, the job would need to know beforehand that the memory allocation spike is imminent, it would need to request from the enclosing instance an increase in memory, and that memory would need to actually be available (unallocated) in a location in which it is usable (i.e. at least on the same nodes as the tasks that need the new memory, better if on the same NUMA domain, etc.) So, this is an interesting and challenging problem, and probably would be an area of active research.