Dmitry Serykh

Results 3 issues of Dmitry Serykh

This [program](https://gist.github.com/mortvest/39e5a93026264b220479de06a34228dc) (batched radix sort) crashes on 640 < n < 1001 on GPU04, GPU03 and GPU02. OpenCL backend only.

Can we amortize the data transfer overheads (host -> device and device -> host with kernel execution) when running with multiple chunks using multithreading?

enhancement

GPU02 can be used for testing

enhancement