chipStar
chipStar copied to clipboard
OpenCL: Support devices with cl_ext_buffer_device_address
Use cl_ext_buffer_device_address extension (experimental at the time of this commit) for managing HIP device allocations on devices where neither USM nor SVM is available. The latest main of POCL and, hopefully soon, Mesa's rusticl supports this extension. Briefly, the extension enables the OpenCL backend to pin cl_mem buffers to device memory and obtain their fixed addresses.
The extension has a down side that the addresses of the device and
host allocations may alias, thus, we can't automatically determine
copy direction in hipMemcpy*(, hipMemcpyDefault)
calls. For this
reason the unified [virtual] addressing feature is set off and,
consequently, hipMemcpyDefault is unsupported and allocations from
hipHostMalloc() are not implicitly mapped and portable. Also,
hipHostMalloc(..., hipHostMallocMapped)
calls are not supported yet
(unimplemented).
Other changes:
-
Use hipMemoryTypeDevice type for the shadow buffers of the global-scope
__device__
variables. -
Remove redundant hipDeviceProp_t structure copy in chipStar::Device::getAttr().
-
Define missing unifiedAddressing device property.
-
Define missing hipDeviceAttributeUnifiedAddressing attribute.
-
On devices with
unifiedAddressing == 1
hipHostMallocMapped and hipHostMallocPortable flags are set on when calling hipHostMalloc(). -
Map/unmap only device accessible host allocations (ones with hipHostMallocMapped).
-
On devices with
unifiedAddressing == 0
hipHostMalloc() called with default flags allocates plain host memory instead of device (accessible) memory. -
Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods which is needed by the OpenCL backend for calling the right driver copy API function under
unifiedAddressing == 0
. -
Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing OpenCL the backend to use either USM, SVM or the cl_ext_buffer_device_address.
-
Added a test for cl_ext_buffer_device_address and also check excepted API behaviors for
unifiedAddressing == 0
. The test is run if the device supports the extension. -
Refactor SPIR-V processing. Filter out chipStar metadata expressed as global-scope variables.
-
CHIP_DUMP_SPIRV=1 dumps SPIR-V on failing SPIR-V processing step.