wlruys

Results 22 comments of wlruys

I've added the examples to reproduce this with/without VECs in https://github.com/ut-parla/Parla.py/tree/master/benchmarks/gpu_threading, as well as the MPI and CPP OpenMP comparisons. As a log I'm also copying the performance numbers here...

From what I'm seeing on this, they follow the CUDA API quite closely which afaik doesn't have a way of setting a default. (aside from the usual two defaults: per-thread...