occa
occa copied to clipboard
Function to synchronize an entire device.
The existing implementation of device::finish()
only synchronizes the current stream (e.g., calling cuStreamSynchronize
), making both the function name and documentation somewhat misleading.
Some downstream OCCA applications require a mechanism to wait for all enqueued operations on a device to finish, similar to cudaDeviceSynchronize
.
The programming models of the other backends (i.e., OpenCL, SYCL) don't have a similar API for device synchronization, however modeDevice_t
already retains a vector of streams which have been allocated so this should not be an issue.
Two potential options to move forward with this are:
- Change the implementation of
device::finish()
to match its name and documentation, then add a function to thestream
class for synchronizing only a particular stream (and possibly a shortcut to synch the current stream). - Keep the current implementation of
device::finish()
, but update its documentation and add another functiondevice::finishAll()
which synchronizes all streams on a device.
After discussing this at the OCCA TAF meeting we will go with the second option, adding a new function finishAll()
to the occa::device
class.