Memory Usage: Add memory transferred between Kokkos Mem Spaces
This PR adds memory transferred between Kokkos Mem Spaces to the Kokkos Tools memory-usage tool library. This is done based on a request in Kokkos Tools Github Issue #50.
This PR adds functionality to the tool so that the size of data transferred in the deep_copy in a Kokkos application program is accumulated during the execution of a Kokkos program. The deep_copy accumulation is done per Kokkos Memory Space dst->src pair (e.g., OpenMP on host CUDA on device). Note that the "dst" and "src" Kokkos memory space being the same means that there was a deep_copy in the same memory space.
When Kokkos is finalized in a Kokkos application program, the finalize callback prints out the deep_copy accumulations per memory space to the file.
Here is output from the stream.cuda in the Kokkos-core benchmarks directory run on Perlmutter on 1 GPU. The number of bytes in the HighWater at the last point in time shown in the first table is equal to the data transferred between host and device.
-----@nid----:~/kks-clean/benchmarks/stream> cat nid001213-1948341-Cuda.memspace_usage
# Space Cuda
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.018076 7629.4 7629.4 142.7
0.047229 15258.8 15258.8 143.5
0.071899 22888.2 22888.2 144.0
12.252089 15258.8 22888.2 23032.6
12.313410 7629.4 22888.2 23032.6
12.331277 0.0 22888.2 23032.6
# Data transferred between Kokkos Memory Spaces ---
# DstSpace SrcSpace Data-Transferred(MB)
Cuda Cuda 0.0
# DstSpace SrcSpace Data-Transferred(MB)
Cuda Host 22888.2
# DstSpace SrcSpace Data-Transferred(MB)
Host Cuda 22888.2
# DstSpace SrcSpace Data-Transferred(MB)
Host Host 0.0
------@nid----:~/kks-clean/benchmarks/stream> cat nid001213-1948341-
nid001213-1948341-Cuda.memspace_usage nid001213-1948341-Host.memspace_usage
vkale3@nid001213:~/kks-clean/benchmarks/stream> cat nid001213-1948341-Host.memspace_usage
# Space Host
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.077138 7629.4 7629.4 144.0
0.768111 15258.8 15258.8 7773.6
1.455819 22888.2 22888.2 15403.0
12.218183 15258.8 22888.2 23032.6
12.229001 7629.4 22888.2 23032.6
12.240329 0.0 22888.2 23032.6
# Data transferred between Kokkos Memory Spaces ---
# DstSpace SrcSpace Data-Transferred(MB)
Cuda Cuda 0.0
# DstSpace SrcSpace Data-Transferred(MB)
Cuda Host 22888.2
# DstSpace SrcSpace Data-Transferred(MB)
Host Cuda 22888.2
# DstSpace SrcSpace Data-Transferred(MB)
Host Host 0.0
I have updated with the requested changes. I did a quick check on my mac laptop and it works on my laptop fine with the change to have a separate file, and for reference, here is the output:
vklap % more /kks/benchmarks/stream/s1ca-13343-Host.memspace_usage
# Space Host
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.000067 762.9 762.9 1856.0
0.084233 1525.9 1525.9 783152.0
0.170814 2288.8 2288.8 1564448.0
3.166797 1525.9 2288.8 2345824.0
3.180026 762.9 2288.8 2345824.0
3.194930 0.0 2288.8 2345824.0
vklap % more /kks/benchmarks/stream/s1ca-13343-Host.memspace_transfers
# Dst Mem Space Src Mem Space Total-Data-Transferred(MB)
Host Host 4577.6
(As can be seen, there are now two files rather than one.)
Right now, there seems to be blocker on the CI - I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now. The problem is arising from Kokkos_Profiling.hpp and is coming from int_for_synchronization_reason(), as can be seen from the logs. I don't think I changed anything in this PR that would impact that? This is a CI check from when it was working previously: https://github.com/kokkos/kokkos-tools/actions/runs/10948763633/job/30400520719. The PR is built on top of the latest develop.
Just put it into a separate file otherwise this is good.
This is fixed.
I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now
Have you rebased on top of develop? That might fix it.
I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now
Have you rebased on top of
develop? That might fix it.
Thanks for checking this. As mentioned in the last sentence of my previous comment (https://github.com/kokkos/kokkos-tools/pull/272#issuecomment-2519032604), I did rebase on top of develop of Kokkos Tools.
Actually, I think this has to do with Kokkos Tools Issue #275, where there is ultimately an incompatibility with of Kokkos Tools with the current Kokkos version (https://github.com/kokkos/kokkos-tools/issues/275#issuecomment-2422634745). This issue came up in early October, which was after the previous successful run I linked. I will see if there are quick fixes from that, but given that #275 impacts other PRs, I will advocate for that github issue to be a priority to fix for Kokkos Tools.