mallocMC icon indicating copy to clipboard operation
mallocMC copied to clipboard

"Synthetic" Benchmark for PIConGPU

Open ax3l opened this issue 9 years ago • 1 comments

@psychocoderHPC @slizzered we should make an additional benchmark setup that is close to the usage of PIConGPU particle allocations.

e.g., "allocate and free N chunks of few KB of (particle) data per second" (from T threads)

We will need such a benchmark since with hardware such as knights landing and Power 8/9 we could even be new-bound on the host side and need to know at which level of concurrency this will kick in.

Related to #96 and #130

@bussmann @juckel this might be an interesting task for the next many-core lecture (HOPS+CO)

ax3l avatar Oct 05 '16 14:10 ax3l

this is planned as a GPU students final project for this year. Currently preparing a plan, like:

  • benchmark code into /benchmarks
    • measuring mallocMC alloc + free performance
    • allocate chunks and perform random + stream access (for upcoming page migration test)
  • new allocation policy to get unified memory (cudaMallocManaged) (for testing page migration)

What do you think?

Note that unified memory does not work with IPC. This is currently only for CUDA, but benchmarks will be necessary for hip-clang too.

tdd11235813 avatar Nov 25 '19 14:11 tdd11235813