occa icon indicating copy to clipboard operation
occa copied to clipboard

Function to synchronize an entire device.

Open kris-rowe opened this issue 2 years ago • 1 comments

The existing implementation of device::finish() only synchronizes the current stream (e.g., calling cuStreamSynchronize), making both the function name and documentation somewhat misleading.

Some downstream OCCA applications require a mechanism to wait for all enqueued operations on a device to finish, similar to cudaDeviceSynchronize.

The programming models of the other backends (i.e., OpenCL, SYCL) don't have a similar API for device synchronization, however modeDevice_t already retains a vector of streams which have been allocated so this should not be an issue.

Two potential options to move forward with this are:

  1. Change the implementation of device::finish() to match its name and documentation, then add a function to the stream class for synchronizing only a particular stream (and possibly a shortcut to synch the current stream).
  2. Keep the current implementation of device::finish(), but update its documentation and add another function device::finishAll() which synchronizes all streams on a device.

kris-rowe avatar Jun 24 '22 20:06 kris-rowe

After discussing this at the OCCA TAF meeting we will go with the second option, adding a new function finishAll() to the occa::device class.

kris-rowe avatar Jun 29 '22 18:06 kris-rowe