Dmitry Serykh
Results
3
issues of
Dmitry Serykh
This [program](https://gist.github.com/mortvest/39e5a93026264b220479de06a34228dc) (batched radix sort) crashes on 640 < n < 1001 on GPU04, GPU03 and GPU02. OpenCL backend only.
Can we amortize the data transfer overheads (host -> device and device -> host with kernel execution) when running with multiple chunks using multithreading?
enhancement