wlruys comments

Repositories
Issues
Comments

Results 22 comments of


                                            wlruys

Slowdown of GPU Data Transfers in Python Threads

I've added the examples to reproduce this with/without VECs in https://github.com/ut-parla/Parla.py/tree/master/benchmarks/gpu_threading, as well as the MPI and CPP OpenMP comparisons. As a log I'm also copying the performance numbers here...

Should Parla tasks/contexts also setup stream handles for numba?

From what I'm seeing on this, they follow the CUDA API quite closely which afaik doesn't have a way of setting a default. (aside from the usual two defaults: per-thread...