chipStar icon indicating copy to clipboard operation
chipStar copied to clipboard

OpenCL: Support devices with cl_ext_buffer_device_address

Open linehill opened this issue 10 months ago • 2 comments

Use cl_ext_buffer_device_address extension (experimental at the time of this commit) for managing HIP device allocations on devices where neither USM nor SVM is available. The latest main of POCL and, hopefully soon, Mesa's rusticl supports this extension. Briefly, the extension enables the OpenCL backend to pin cl_mem buffers to device memory and obtain their fixed addresses.

The extension has a down side that the addresses of the device and host allocations may alias, thus, we can't automatically determine copy direction in hipMemcpy*(, hipMemcpyDefault) calls. For this reason the unified [virtual] addressing feature is set off and, consequently, hipMemcpyDefault is unsupported and allocations from hipHostMalloc() are not implicitly mapped and portable. Also, hipHostMalloc(..., hipHostMallocMapped) calls are not supported yet (unimplemented).

Other changes:

  • Use hipMemoryTypeDevice type for the shadow buffers of the global-scope __device__ variables.

  • Remove redundant hipDeviceProp_t structure copy in chipStar::Device::getAttr().

  • Define missing unifiedAddressing device property.

  • Define missing hipDeviceAttributeUnifiedAddressing attribute.

  • On devices with unifiedAddressing == 1 hipHostMallocMapped and hipHostMallocPortable flags are set on when calling hipHostMalloc().

  • Map/unmap only device accessible host allocations (ones with hipHostMallocMapped).

  • On devices with unifiedAddressing == 0 hipHostMalloc() called with default flags allocates plain host memory instead of device (accessible) memory.

  • Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods which is needed by the OpenCL backend for calling the right driver copy API function under unifiedAddressing == 0.

  • Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing OpenCL the backend to use either USM, SVM or the cl_ext_buffer_device_address.

  • Added a test for cl_ext_buffer_device_address and also check excepted API behaviors for unifiedAddressing == 0. The test is run if the device supports the extension.

  • Refactor SPIR-V processing. Filter out chipStar metadata expressed as global-scope variables.

  • CHIP_DUMP_SPIRV=1 dumps SPIR-V on failing SPIR-V processing step.

linehill avatar Apr 15 '24 12:04 linehill