clspv
clspv copied to clipboard
Support for physical addressing
Motivation
- Full support for OpenCL C requires physical addressing. While clspv has done a great job of enabling many OpenCL C programs without the use of physical addressing, there will always be programs that cannot be compiled without physical addressing.
- clspv's pointer bitcast rewriting passes are rather complex and tend to require changes when large new code bases are compiled.
- Some OpenCL CTS tests would be easy to pass with physical addressing (e.g. support for
NULLkernel arguments). - Physical addressing opens the door to supporting OpenCL SVM or USM which in addition to being desirable features in their own right are also very useful to provide a bridge to enable other programming languages to target Vulkan.
Proposed design outline
Here's a rough outline of what I prototyped:
- Use the SPV_KHR_physical_storage_buffer extension to enable the use of the PhysicalStorageBufferEXT storage class where StorageBuffer is currently used.
- Disable pointer transformation passes (pointer bitcast replacement and simplification passes).
- Introduce new runtime interfaces in the clspv non-semantic instruction set to allow passing pointers obtained by the runtime using
VK_KHR_buffer_device_addressvia either push constants or uniform buffers. The interface uses a structure of 64-bit integers (or 2-element vectors of 32-bit integers), one for each pointer argument, that is part of the global push constant structure or stored in a dedicated uniform buffer. Pointer arguments to kernel functions are rewritten to load the required integers from this structure and convert them to pointers. A new pass is introduced to declare the interface structure and perform the necessary rewriting. - Add a new command line option to enable the use of physical addressing where possible (disabled by default)
- Add a new command line option to select the 64-bit addressing model (disabled by default, implied by or required with physical addressing)
Proposed staging
- [ ] Support 64-bit addressing model (hard requirement of the PhysicalStorageBuffer64EXT addressing model)
- [x] Add libclc target (https://reviews.llvm.org/D116668)
- [x] Build libclc as part of clspv's build system
- [x] Add support for 64-bit addressing model https://github.com/google/clspv/pull/936
- [x] Update clspv non-semantic instruction set with new runtime interface instructions
- [x] Specification (https://github.com/KhronosGroup/SPIRV-Registry/pull/153)
- [x] Headers (https://github.com/KhronosGroup/SPIRV-Headers/pull/284)
- [x] Use in clspv and other dependent projects (e.g. SPIR-V Tools)
- [x] Introduce utilities to manage global push constants https://github.com/google/clspv/pull/882
- [x] Add support for physical addressing (using 64-bit integers) https://github.com/google/clspv/pull/954
- [ ] Add support for physical addressing (using 2-element vectors of 32-bit integers) https://github.com/google/clspv/issues/956
Seems reasonable overall. A few points to consider:
- You can only disable pointer transforms for global (and sometimes constant) address space.
- More detail on the reflection design would be appreciated. The push constant case seems clearer for reflection than the UBO case.
- Nulls are tricky. You need to codegen as a conversion from integer 0 (or (0,0)) and then comparison.
- I assume this is all based on a new option.
- To be clear, this feature will wholesale replace uses of StorageBuffer with PhysicalStorageBuffer?
Thanks for the feedback!
- Yes, my prototype has them disabled entirely but this only worked because I was looking at a very narrow set of workloads.
- Vulkan only guarantees 128 bytes worth of push constants. At 8 byte per pointer, that's 4 kernels with 4 pointer arguments each, not even taking into account the push constant space used by PODs or other global push constants. I think it's safe to assume that cases where we can't entirely rely on push constants to pass pointers will be quite common. I'm proposing we use a UBO when we run out of push constant space (either as a complement to using push constants or instead of when there isn't enough space).
- Thanks for the tip! Looking at my prototype I'm doing all comparisons after conversion to integers as pointer comparison are not allowed with operands pointing into the PhysicalStorageBufferEXT storage class.
- Yup, so "obvious" I didn't say it :). I'll add a mention in the proposal.
- Unless we find it's not possible, yes, that's currently the intent.
FYI Nvidia originally had a limit on CUDA parameters of 256 bytes. This limit was increased to 4k in cuda 2.x by moving to constant memory AKA uniform memory/buffers.
I mention this to show there's industry precedent for the exact solution proposed for the push constant size limitations.