mallocMC
mallocMC copied to clipboard
"Synthetic" Benchmark for PIConGPU
@psychocoderHPC @slizzered we should make an additional benchmark setup that is close to the usage of PIConGPU particle allocations.
e.g., "allocate and free N chunks of few KB of (particle) data per second" (from T threads)
We will need such a benchmark since with hardware such as knights landing and Power 8/9 we could even be new-bound on the host side and need to know at which level of concurrency this will kick in.
Related to #96 and #130
@bussmann @juckel this might be an interesting task for the next many-core lecture (HOPS+CO)
this is planned as a GPU students final project for this year. Currently preparing a plan, like:
- benchmark code into /benchmarks
- measuring mallocMC alloc + free performance
- allocate chunks and perform random + stream access (for upcoming page migration test)
- new allocation policy to get unified memory (cudaMallocManaged) (for testing page migration)
- also possible to test oversubscribing as of Pascal (probably requires another OOM policy)
- ... and access counters as of Volta
What do you think?
Note that unified memory does not work with IPC. This is currently only for CUDA, but benchmarks will be necessary for hip-clang too.