What it mean ? : [WARNING] Running DLIO with 32 threads for I/O but core available 2 are insufficient and can lead to lower performance.
Why is this error occurring, and how can I fix it? The number of my cores checked through the "nproc" command is 512 (Hyper Thread option On)
I found this message and performance problems while running mlpstorage training runs with a small environment. As far as I could find, DLIO wants 4 threads per accelerator to drive workload by default (more if you override the thread parameters). However, depending on your mlpstorage parameters and node count, the process to simulate an accelerator may be limited to 1 core (per accelerator). I managed resolve this by overriding the mpi bind-to and map-by parameters.
@fincherjc, Thanks for your answer. Can you tell me the full run cmd you ran on how you set up parameters? That would be a great help to me. Thank you.
For my purposes, I was using a single client and the v2.0 branch:
nohup mlpstorage training run --model unet3d --client-host-memory-in-gb <MEM> --exec-type=mpi --num-accelerators <N> --accelerator-type h100 --num-client-hosts <N> --data-dir <DIR> --param reader.odirect=true reader.read_threads=<NPROC> dataset.num_files_train=<DATASIZE> --oversubscribe &
The current benchmark does not specify any mpi bind-to or map-by parameters, so these fall back to openmpi defaults which will vary depending on the number of accelerators and clients you define. Adding --oversubscribe changes the bind-to behavior to "none" (meaning all CPU's can be used at direction of OS scheduler).
With this running in the background, you can verify in nohup.out that the warning is not present. You should also see in mpstat -P ALL 5 output that more CPU is active compared to what you ran to get this warning.
@xdreamcoder the main branch of this repo has been updated to give a new default mpi bind and map parameter that should address this. It should mirror what you get when incorporating --oversubscribe. Can you check and confirm you've gotten past this issue?
@fincherjc, Thank you for your quick help and fix. The issue #201 and #202 that I gave you has been confirmed to be resolved. However, my CPU's thread count is 512(I checked through the nproc command), : 512 = 128core x 2 sockets x2(HyperThreading on) it seems that the number of read_threads can only be allocated up to 128. If this happens, I think it will be treated with the same performance as the CPU of the 1-socket + HyperThreading off option. Could you check this problem?
@xdreamcoder Can you file this as a new issue to keep tracking clean?
@fincherjc At your request, I created a new issue (github.com/mlcommons/storage/issues/205)