kokkos-tools
kokkos-tools copied to clipboard
Fix memory HWM to also show how much memory the profiler is taking
I've been tracking down a memory issue and found that memory usage tool takes most of the memory for a real run. It would be nice to track that and output it, or remove it from the RSS.
Sounds like you have many small allocations? We can look at that.
Yes, I use trilinos. Most are in that lib. Tons of these in tpetra, muelu ifpack
871.121941 0x2aaaf6aef500 8 Host Allocate DualView::modified_flags
871.121956 0x2aaaf6aef640 8 Host Allocate DualView::modified_flags
871.121970 0x2aaaf6aef780 8 Host Allocate DualView::modified_flags
871.121986 0x2aaaf407dbc0 -8 Host DeAllocate DualView::modified_flags
871.121993 0x2aaaf407e3c0 -8 Host DeAllocate DualView::modified_flags
871.122003 0x2aaaf3b52500 -8 Host DeAllocate DualView::modified_flags
In the last 100k lines of the log there are 13k deallocs and 11k of them are dual view allocations, mostly modified flags
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep DeAll | wc
13118 78920 1255978
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep DeAll | grep DualView |wc
11331 67998 1081020
And the bulk of what's left is
Host DeAllocate MV::normImpl lcl
Realized that dealloc is the wrong thing to look at since it is at the end of the run, looking at allocation.
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | wc
10434 62706 990837
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | grep modified_f |wc
9505 57030 912480
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | grep MV |wc
826 5074 69856
Possible duplicate of #9.