dlio_benchmark icon indicating copy to clipboard operation
dlio_benchmark copied to clipboard

any way to profile memory usage in dlio?

Open krehm opened this issue 1 year ago • 18 comments

Is there any way to track memory usage in a DLIO benchmark process?

I ask because I have a test node with 256 GB of memory. When running unet3d with 9375 files, I see the MPI process grow from 5.5 GB to 11 GB relatively quickly. Each of the 4 spawn reader processes is 1.5 GB. That works out to about 18 GB per MPI process. I can run 12 MPI processes with DAOS and get full accelerator efficiency, but I can't scale up farther because the machine runs out of memory, and processes are too big to swap, I have to reboot the node to get it working again.

My thought was to try to track down what causes the jumps in memory size in the MPI process, maybe there is a way to reduce the amount of RSS space consumed, but if this has been done before, I'd rather learn from the experts. :-)

krehm avatar Jan 23 '24 18:01 krehm