mpich icon indicating copy to clipboard operation
mpich copied to clipboard

MPICH high memory footprint

Open aditya-nishtala opened this issue 1 year ago • 6 comments

We ran a simple hello world mpich program where each rank prints the rank id + hostname its running on. The program allocates no memory at all, all of the memory allocation comes from whatever MPICH is doing.

We scaled the from 32 nodes to 768 nodes and measured how much memory is being consumed. MPICH commit tag is 204f8cd This is happening on Aurora

Memory Consumption is equivalent whether using DDR or HBM. Below data is measured on DDR.

  Max DDR utilization (GB)
Node count mpich/ opt/ develop-git.204f8cd
32 22.53
64 24.58
128 28.16
256 35.33
512 50.18
768 68.10

aditya-nishtala avatar Nov 07 '24 03:11 aditya-nishtala

Note that above data is with PPN 96. The reported memory footprint values are in GB per socket. There is linear increase in memory overhead and it is persisting through entire program execution.

nsdhaman avatar Nov 07 '24 04:11 nsdhaman

@aditya-nishtala Could you retry the experiment using a debug-build and enable MPIR_CVAR_DEBUG_SUMMARY=1 and then post one of the log? That may help identify whether the memory is allocated by MPICH or by one of its dependent libraries.

hzhou avatar Nov 07 '24 22:11 hzhou

Taking the difference, the memory increase are roughly linear to the number of nodes, ~55-65 MB/Node. @aditya-nishtala How many PPN (process per node)?

hzhou avatar Nov 07 '24 22:11 hzhou

This is with PPN 96.

nsdhaman avatar Nov 07 '24 22:11 nsdhaman

Thanks @nsdhaman . So that is roughly 6KB per connection.

hzhou avatar Nov 07 '24 23:11 hzhou

Okay, I think the issue is we are allocating too much address table prepared for all possible connections. If we assume no application will use multiple VCI, we could configure with --with-ch4-max-vcis=1, that will cut down the memory growth by 1/64.

For more appropriate fix, we could change the av table accommodate multi-VCI/NIC entries dynamically rather than statically. I probably can implement something like that.

hzhou avatar Nov 07 '24 23:11 hzhou

This is (presumably) resolved with changes to MPICH for dynamic NIC and VCI av tables. Plus the ability to move those tables to DDR. Please reopen if necessary.

raffenet avatar Aug 27 '25 20:08 raffenet