Etaler icon indicating copy to clipboard operation
Etaler copied to clipboard

Optimize data transfer between OpenCL devices.

Open marty1885 opened this issue 5 years ago • 2 comments

Currently copying data between 2 OpenCL backends are done by

  1. Allocating a temporary buffer
  2. Copy data from GPU1 to buffer
  3. Copy data from buffer to GPU2
  4. release the buffer

Which is slow. There are other more optimized routes. But the mechanism to trigger it is yet to determined.

Sol 1: Using clEnqueueMapBuffer

  1. Map the memory from GPU1 to CPU (a pre-pinned DMA transfer)
  2. Copy data from buffer to GPU2
  3. unmap buffer

Sol 2: With OpenCL 2.0's Shared Virtual Memory. Host memory is not touched, super fast.

  1. Allocate Tensors as SVM buffers
  2. Ask GPU2 to copy data from GPU1

They should make multi-gpu faster.

marty1885 avatar May 20 '19 02:05 marty1885

Sol 3:

  1. Make a copy of tensor on GPU1 (clCreateBuffer && clEnqueueCopyBuffer)
  2. Migrate the buffer (clEnqueueMigrateMemObject) from GPU1 to GPU2

But it is still not optimal that we need to copy a buffer on GPU1

marty1885 avatar Jun 02 '19 14:06 marty1885

Apparently Nvidia does have some OpenCL 2.0 support. https://streamhpc.com/blog/2017-02-22/nvidia-enables-opencl-2-0-beta-support/

Seems I can build OpenCL 2.0 code before grabbing myself a Navi card.

marty1885 avatar Jul 11 '19 17:07 marty1885