HPC
HPC copied to clipboard
Node packing
Based on discussion with @diskwarrior document or have a workshop on strategies for packing multiple jobs into a node.
I looked into this, but was not personally able to get multiple jobs to run on a node. As I understand the SLURM documentation, if the OVERSUBSCRIBE feature on the partition is set to EXCLUSIVE, this setting takes precedence over a jobs setting (i.e., you can't just submit with various options such as -s or --overcommit). If we wanted to allow this easily, we would need to change the OVERSUBSCRIBE partition setting.
There is a FAQ entry on SLURM which details some of the changes that would need to be made:
https://slurm.schedmd.com/faq.html#sharing
There may be other ways, but I suppose one alternative would be to package jobs into Python scripts and use MPI.
Not sure if there are other thoughts, and let me know if someone has a different experience.