Transition to physical addressing
Physical addressing ought to become the default when available but we should reach parity wrt. logical addressing on the CTS before we transition. Once we land https://github.com/kpet/clvk/pull/453, we can start working on the following:
- [ ] Auto-detect and enable physical addressing when available
- [ ] Fix all CTS tests that have regressed (to replace with a list of tests once the PR lands)
Here are the regressions I got running on an Nvidia device:
DATA ERROR
- [x] api/set_kernel_arg_struct_array
- [ ] basic/async_copy_local_to_global
- [ ] basic/async_copy_global_to_local
- [ ] basic/async_strided_copy_global_to_local
- [ ] basic/async_strided_copy_local_to_global
- [ ] basic/vload_local
- [ ] basic/vstore_local
- [x] basic/work_item_functions
- [x] compiler/compile_and_link_status_options_log
- [x] compiler/execute_after_serialize_reload_library
- [x] compiler/execute_after_simple_library_with_link
- [x] compiler/execute_after_two_file_link
- [x] compiler/multi_file_libraries
- [x] compiler/multiple_files
- [x] compiler/multiple_files_multiple_libraries
- [x] compiler/multiple_libraries
- [x] compiler/program_binary_type
- [x] compiler/execute_after_included_header_link
- [x] images/kernel_image_methods/1D-CL_SIGNED_INT32
- [x] images/kernel_image_methods/1D-CL_SIGNED_INT8
- [x] images/kernel_image_methods/1D-CL_SNORM_INT16
- [x] images/kernel_image_methods/1D-CL_SNORM_INT8
- [x] images/kernel_image_methods/1D-CL_UNORM_INT16
- [x] images/kernel_image_methods/1D-CL_UNORM_INT8
- [x] images/kernel_image_methods/1D-CL_UNORM_SHORT_565
- [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT16
- [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT32
- [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT8
- [x] images/kernel_image_methods/1Darray-CL_FLOAT
- [x] images/kernel_image_methods/1Darray-CL_HALF_FLOAT
- [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT16
- [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT32
- [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT8
- [x] images/kernel_image_methods/1Darray-CL_SNORM_INT16
- [x] images/kernel_image_methods/1Darray-CL_SNORM_INT8
- [x] images/kernel_image_methods/1Darray-CL_UNORM_SHORT_565
- [x] images/kernel_image_methods/2D-CL_FLOAT
- [x] images/kernel_image_methods/2D-CL_HALF_FLOAT
- [x] images/kernel_image_methods/2D-CL_SIGNED_INT16
- [x] images/kernel_image_methods/2D-CL_SIGNED_INT32
- [x] images/kernel_image_methods/2D-CL_SIGNED_INT8
- [x] images/kernel_image_methods/2D-CL_SNORM_INT16
- [x] images/kernel_image_methods/2D-CL_SNORM_INT8
- [x] images/kernel_image_methods/2D-CL_UNORM_INT16
- [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT16
- [x] images/kernel_image_methods/2Darray-CL_FLOAT
- [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT16
- [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT32
- [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT8
- [x] images/kernel_image_methods/2Darray-CL_SNORM_INT8
- [x] images/kernel_image_methods/2Darray-CL_UNORM_INT8
- [x] images/kernel_image_methods/2Darray-CL_UNORM_SHORT_565
- [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT16
- [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT8
- [x] images/kernel_image_methods/3D-CL_FLOAT
- [x] images/kernel_image_methods/3D-CL_SIGNED_INT16
- [x] images/kernel_image_methods/3D-CL_SIGNED_INT32
- [x] images/kernel_image_methods/3D-CL_SIGNED_INT8
- [x] images/kernel_image_methods/3D-CL_SNORM_INT8
- [x] images/kernel_image_methods/3D-CL_UNORM_INT16
- [x] images/kernel_image_methods/3D-CL_UNORM_INT8
- [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT16
- [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT32
- [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT8
- [x] images/kernel_image_methods/1D-CL_FLOAT
- [x] images/kernel_image_methods/1D-CL_HALF_FLOAT
- [x] images/kernel_image_methods/1D-CL_SIGNED_INT16
- [x] images/kernel_image_methods/1Darray-CL_UNORM_INT16
- [x] images/kernel_image_methods/1Darray-CL_UNORM_INT8
- [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT16
- [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT32
- [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT8
- [x] images/kernel_image_methods/2D-CL_UNORM_INT8
- [x] images/kernel_image_methods/2D-CL_UNORM_SHORT_565
- [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT32
- [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT8
- [x] images/kernel_image_methods/2Darray-CL_HALF_FLOAT
- [x] images/kernel_image_methods/2Darray-CL_SNORM_INT16
- [x] images/kernel_image_methods/2Darray-CL_UNORM_INT16
- [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT32
- [x] images/kernel_image_methods/3D-CL_HALF_FLOAT
- [x] images/kernel_image_methods/3D-CL_SNORM_INT16
- [x] images/kernel_image_methods/3D-CL_UNORM_SHORT_565
- [x] vectors/vec_align_packed_struct_arr
- [x] profiling/read_array_struct
CRASH
- [ ] atomics/atomic_add
- [ ] atomics/atomic_and
- [ ] atomics/atomic_cmpxchg
- [ ] atomics/atomic_dec
- [ ] atomics/atomic_max
- [ ] atomics/atomic_min
- [ ] atomics/atomic_or
- [ ] atomics/atomic_sub
- [ ] atomics/atomic_xchg
- [ ] atomics/atomic_inc
- [ ] atomics/atomic_xor
- [ ] c11_atomics/atomic_compare_exchange_strong
- [ ] c11_atomics/atomic_exchange
- [ ] c11_atomics/atomic_fetch_add
- [ ] c11_atomics/atomic_fetch_and
- [ ] c11_atomics/atomic_fetch_min
- [ ] c11_atomics/atomic_fetch_orand
- [ ] c11_atomics/atomic_fetch_sub
- [ ] c11_atomics/atomic_fetch_xor
- [ ] c11_atomics/atomic_fetch_xor2
- [ ] c11_atomics/atomic_compare_exchange_weak
- [ ] c11_atomics/atomic_fence
- [ ] c11_atomics/atomic_fetch_max
- [ ] c11_atomics/atomic_fetch_or
- [x] half/vstore_half_rtn-wimpy
- [x] half/vstore_half_rtp-wimpy
- [x] half/vstore_half_rte-wimpy
- [x] half/vstore_half_rtz-wimpy
- [x] half/vstorea_half_rte-wimpy
- [ ] non_uniform_work_group/non_uniform_1d_atomics
- [x] non_uniform_work_group/non_uniform_1d_barriers
- [ ] non_uniform_work_group/non_uniform_2d_atomics
- [x] non_uniform_work_group/non_uniform_2d_barriers
- [ ] non_uniform_work_group/non_uniform_3d_atomics
- [x] non_uniform_work_group/non_uniform_3d_barriers
- [ ] non_uniform_work_group/non_uniform_3d_basic
- [ ] non_uniform_work_group/non_uniform_other_basic
- [ ] non_uniform_work_group/non_uniform_1d_basic
- [ ] non_uniform_work_group/non_uniform_2d_basic
- [ ] non_uniform_work_group/non_uniform_other_atomics
FAILED
- [x] buffers/buffer_read_struct
- [x] buffers/buffer_map_read_struct
Thanks for posting this! Looks about the same as what I'm seeing at my end.
I am working on those right now.
I have started with buffer/buffer_read_struct which is in fact as those in DATA ERROR.
It feels like a simple one to start with, but I have trouble understanding what is wrong.
Here is the kernel and the compiled version: https://godbolt.org/z/1Ejas96Yb
The output is:
dst[tid].a = 3.40282346638528860e+38dst[tid].b = 0
I have tried to inverse the two stores: dst[tid].a and dst[tid].b. I ended up with:
dst[tid].a = ((1<<16)+1)dst[tid].b = 0.0
It is like only the second storing is taken into account, and the last index of the OpPtrAccessChain used to compute the storing address is not taken into account.
I have also tried to force clspv to use ulong for the index of the two OpPtrAccessChain used to compute the storing address, but it had the same behavior.
@alan-baker , @kpet , any ideas?
The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.
Agree the shader looks correct after a quick scan. As tempting as it is to blame drivers, I am seeing failures across two Vulkan stacks and HW from 3 vendors (NVIDIA, Mesa/AMD, Mesa/Intel). clvk is definitely not allocating the memory properly, the code in main does not have some of the flags used in my original prototype (VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage flag on the buffer and corresponding VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT memory allocation flag). After fixing this the validation layers are happy but it does not affect what I'm reading back from memory. On an Intel A750, I'm seeing all zeros. On AMD HW, I'm seeing one of the two values (the float with the test unmodified, the int if I comment out the line storing the float) but 3 bytes off where it should be. I've tried to strengthen memory barriers after kernels, to no avail. This is going to be a fun one ...
The shader looks ok to me. If you make the stores have thread dependent values (e.g. store
tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.
Yes I have tried that, the writing occurs in the right thread.
I've got the test passing, we're not laying out the structure but all structures in the PhysicalStorageBuffer storage class must be explicitly laid out. I've hacked clspv to test but a proper solution will require more work. There might be cases where the same struct type is used to declare objects in storages classes where they must be explicitly laid out and in others where they must not but the layout decorations are applied to the type so we need multiple types. I vaguely remember that there's logic to deal with this in clspv already. Happy to hand over to @alan-baker or @rjodinchr for a full clspv solution.
Here's the clspv hack I used for reference:
diff --git a/lib/SPIRVProducerPass.cpp b/lib/SPIRVProducerPass.cpp
index d39555e5..f1eb95bc 100644
--- a/lib/SPIRVProducerPass.cpp
+++ b/lib/SPIRVProducerPass.cpp
@@ -1918,9 +1918,9 @@ SPIRVID SPIRVProducerPassImpl::getSPIRVType(Type *Ty, bool needs_layout) {
StructType *canonical = cast<StructType>(CanonicalType(STy));
bool use_layout =
(Option::SpvVersion() < SPIRVVersion::SPIRV_1_4) || needs_layout;
- if (TypesNeedingLayout.idFor(STy) &&
+ if (/*TypesNeedingLayout.idFor(STy) &&
(canonical == STy || !TypesNeedingLayout.idFor(canonical)) &&
- use_layout) {
+ use_layout*/true) {
for (unsigned MemberIdx = 0; MemberIdx < STy->getNumElements();
MemberIdx++) {
// Ops[0] = Structure Type ID
I'll take a look, I would have thought this was handled already.
I am working on the DATA ERROR issue.
I have made some progress, but I need more time to have it fixed.