Iztok Lebar Bajec
Iztok Lebar Bajec
Try with ``` export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/lib/intel64/libiomp5.so ```
If you are using `--gpus-acct` try #73.
Reviewing the error listed in the first post this has nothing to do with `gpus.go`, but with `node.go`. Try running the command `sinfo -h -N -O NodeList,AllocMem,Memory,CPUsState,StateLong` on your system...
@JaderGiacon I have integrated the fix from @DImuthuUpe, and also patched `gpus.go` as there was an issue when gres does not use gpuType. Could you perhaps test if it works...
Ever since slurm v19.05.0rc1, slurm provides another way to check for Available and Active GRES, i.e. via : `sinfo -a -h --Format=Nodes,Gres,GresUsed`. I have refactored the `gpus.go` to be based...
@martialblog thanks for testing; I did not have the chance to test the `development` branch, but PR #73 has been up and running on SLURM 21.08.5 for a couple of...
Based on @PeganovAnton suggestion, I'm copying from https://github.com/NVIDIA/NeMo/pull/4410#discussion_r904058505, just to have as a future reference, especially since supposedly "multiprocessing in P&C is prone to hangups". > In my experience there...
@titu1994, sorry, but can you be more specific. I mean, do you a) refer to this issue and my last comment regarding distributed cache building, b) PR #4544, which uses...
OK; understood. I may have found a better solution. I'll comment under the appropriate PR.
@blisc @okuchaiev could someone review?