Kernels
Kernels copied to clipboard
debug multi-GPU SYCL nstream
There may be a bug in multi-GPU SYCL nstream as reported in https://github.com/illuhad/hipSYCL/pull/399
@illuhad
Yes, details here:
the value of ls as defined here:
https://github.com/ParRes/Kernels/blob/eff76d7a6bb216e93f5cdf837cedc490b73553e7/Cxx11/nstream-multigpu-dpcpp.cc#L162
is used as start offsets of the local buffers, e.g. https://github.com/ParRes/Kernels/blob/eff76d7a6bb216e93f5cdf837cedc490b73553e7/Cxx11/nstream-multigpu-dpcpp.cc#L184
this does not look correct to me as I believe you would need to sum all the local lengths of all preceding GPUs to get the correct start offset.