clvk icon indicating copy to clipboard operation
clvk copied to clipboard

Transition to physical addressing

Open kpet opened this issue 3 years ago • 11 comments

Physical addressing ought to become the default when available but we should reach parity wrt. logical addressing on the CTS before we transition. Once we land https://github.com/kpet/clvk/pull/453, we can start working on the following:

  • [ ] Auto-detect and enable physical addressing when available
  • [ ] Fix all CTS tests that have regressed (to replace with a list of tests once the PR lands)

kpet avatar Dec 04 '22 20:12 kpet

Here are the regressions I got running on an Nvidia device:

DATA ERROR

  • [x] api/set_kernel_arg_struct_array
  • [ ] basic/async_copy_local_to_global
  • [ ] basic/async_copy_global_to_local
  • [ ] basic/async_strided_copy_global_to_local
  • [ ] basic/async_strided_copy_local_to_global
  • [ ] basic/vload_local
  • [ ] basic/vstore_local
  • [x] basic/work_item_functions
  • [x] compiler/compile_and_link_status_options_log
  • [x] compiler/execute_after_serialize_reload_library
  • [x] compiler/execute_after_simple_library_with_link
  • [x] compiler/execute_after_two_file_link
  • [x] compiler/multi_file_libraries
  • [x] compiler/multiple_files
  • [x] compiler/multiple_files_multiple_libraries
  • [x] compiler/multiple_libraries
  • [x] compiler/program_binary_type
  • [x] compiler/execute_after_included_header_link
  • [x] images/kernel_image_methods/1D-CL_SIGNED_INT32
  • [x] images/kernel_image_methods/1D-CL_SIGNED_INT8
  • [x] images/kernel_image_methods/1D-CL_SNORM_INT16
  • [x] images/kernel_image_methods/1D-CL_SNORM_INT8
  • [x] images/kernel_image_methods/1D-CL_UNORM_INT16
  • [x] images/kernel_image_methods/1D-CL_UNORM_INT8
  • [x] images/kernel_image_methods/1D-CL_UNORM_SHORT_565
  • [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT16
  • [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT32
  • [x] images/kernel_image_methods/1D-CL_UNSIGNED_INT8
  • [x] images/kernel_image_methods/1Darray-CL_FLOAT
  • [x] images/kernel_image_methods/1Darray-CL_HALF_FLOAT
  • [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT16
  • [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT32
  • [x] images/kernel_image_methods/1Darray-CL_SIGNED_INT8
  • [x] images/kernel_image_methods/1Darray-CL_SNORM_INT16
  • [x] images/kernel_image_methods/1Darray-CL_SNORM_INT8
  • [x] images/kernel_image_methods/1Darray-CL_UNORM_SHORT_565
  • [x] images/kernel_image_methods/2D-CL_FLOAT
  • [x] images/kernel_image_methods/2D-CL_HALF_FLOAT
  • [x] images/kernel_image_methods/2D-CL_SIGNED_INT16
  • [x] images/kernel_image_methods/2D-CL_SIGNED_INT32
  • [x] images/kernel_image_methods/2D-CL_SIGNED_INT8
  • [x] images/kernel_image_methods/2D-CL_SNORM_INT16
  • [x] images/kernel_image_methods/2D-CL_SNORM_INT8
  • [x] images/kernel_image_methods/2D-CL_UNORM_INT16
  • [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT16
  • [x] images/kernel_image_methods/2Darray-CL_FLOAT
  • [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT16
  • [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT32
  • [x] images/kernel_image_methods/2Darray-CL_SIGNED_INT8
  • [x] images/kernel_image_methods/2Darray-CL_SNORM_INT8
  • [x] images/kernel_image_methods/2Darray-CL_UNORM_INT8
  • [x] images/kernel_image_methods/2Darray-CL_UNORM_SHORT_565
  • [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT16
  • [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT8
  • [x] images/kernel_image_methods/3D-CL_FLOAT
  • [x] images/kernel_image_methods/3D-CL_SIGNED_INT16
  • [x] images/kernel_image_methods/3D-CL_SIGNED_INT32
  • [x] images/kernel_image_methods/3D-CL_SIGNED_INT8
  • [x] images/kernel_image_methods/3D-CL_SNORM_INT8
  • [x] images/kernel_image_methods/3D-CL_UNORM_INT16
  • [x] images/kernel_image_methods/3D-CL_UNORM_INT8
  • [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT16
  • [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT32
  • [x] images/kernel_image_methods/3D-CL_UNSIGNED_INT8
  • [x] images/kernel_image_methods/1D-CL_FLOAT
  • [x] images/kernel_image_methods/1D-CL_HALF_FLOAT
  • [x] images/kernel_image_methods/1D-CL_SIGNED_INT16
  • [x] images/kernel_image_methods/1Darray-CL_UNORM_INT16
  • [x] images/kernel_image_methods/1Darray-CL_UNORM_INT8
  • [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT16
  • [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT32
  • [x] images/kernel_image_methods/1Darray-CL_UNSIGNED_INT8
  • [x] images/kernel_image_methods/2D-CL_UNORM_INT8
  • [x] images/kernel_image_methods/2D-CL_UNORM_SHORT_565
  • [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT32
  • [x] images/kernel_image_methods/2D-CL_UNSIGNED_INT8
  • [x] images/kernel_image_methods/2Darray-CL_HALF_FLOAT
  • [x] images/kernel_image_methods/2Darray-CL_SNORM_INT16
  • [x] images/kernel_image_methods/2Darray-CL_UNORM_INT16
  • [x] images/kernel_image_methods/2Darray-CL_UNSIGNED_INT32
  • [x] images/kernel_image_methods/3D-CL_HALF_FLOAT
  • [x] images/kernel_image_methods/3D-CL_SNORM_INT16
  • [x] images/kernel_image_methods/3D-CL_UNORM_SHORT_565
  • [x] vectors/vec_align_packed_struct_arr
  • [x] profiling/read_array_struct

CRASH

  • [ ] atomics/atomic_add
  • [ ] atomics/atomic_and
  • [ ] atomics/atomic_cmpxchg
  • [ ] atomics/atomic_dec
  • [ ] atomics/atomic_max
  • [ ] atomics/atomic_min
  • [ ] atomics/atomic_or
  • [ ] atomics/atomic_sub
  • [ ] atomics/atomic_xchg
  • [ ] atomics/atomic_inc
  • [ ] atomics/atomic_xor
  • [ ] c11_atomics/atomic_compare_exchange_strong
  • [ ] c11_atomics/atomic_exchange
  • [ ] c11_atomics/atomic_fetch_add
  • [ ] c11_atomics/atomic_fetch_and
  • [ ] c11_atomics/atomic_fetch_min
  • [ ] c11_atomics/atomic_fetch_orand
  • [ ] c11_atomics/atomic_fetch_sub
  • [ ] c11_atomics/atomic_fetch_xor
  • [ ] c11_atomics/atomic_fetch_xor2
  • [ ] c11_atomics/atomic_compare_exchange_weak
  • [ ] c11_atomics/atomic_fence
  • [ ] c11_atomics/atomic_fetch_max
  • [ ] c11_atomics/atomic_fetch_or
  • [x] half/vstore_half_rtn-wimpy
  • [x] half/vstore_half_rtp-wimpy
  • [x] half/vstore_half_rte-wimpy
  • [x] half/vstore_half_rtz-wimpy
  • [x] half/vstorea_half_rte-wimpy
  • [ ] non_uniform_work_group/non_uniform_1d_atomics
  • [x] non_uniform_work_group/non_uniform_1d_barriers
  • [ ] non_uniform_work_group/non_uniform_2d_atomics
  • [x] non_uniform_work_group/non_uniform_2d_barriers
  • [ ] non_uniform_work_group/non_uniform_3d_atomics
  • [x] non_uniform_work_group/non_uniform_3d_barriers
  • [ ] non_uniform_work_group/non_uniform_3d_basic
  • [ ] non_uniform_work_group/non_uniform_other_basic
  • [ ] non_uniform_work_group/non_uniform_1d_basic
  • [ ] non_uniform_work_group/non_uniform_2d_basic
  • [ ] non_uniform_work_group/non_uniform_other_atomics

FAILED

  • [x] buffers/buffer_read_struct
  • [x] buffers/buffer_map_read_struct

rjodinchr avatar May 30 '23 08:05 rjodinchr

Thanks for posting this! Looks about the same as what I'm seeing at my end.

kpet avatar May 30 '23 09:05 kpet

I am working on those right now.

rjodinchr avatar May 30 '23 10:05 rjodinchr

I have started with buffer/buffer_read_struct which is in fact as those in DATA ERROR. It feels like a simple one to start with, but I have trouble understanding what is wrong.

Here is the kernel and the compiled version: https://godbolt.org/z/1Ejas96Yb

The output is:

  • dst[tid].a = 3.40282346638528860e+38
  • dst[tid].b = 0

I have tried to inverse the two stores: dst[tid].a and dst[tid].b. I ended up with:

  • dst[tid].a = ((1<<16)+1)
  • dst[tid].b = 0.0

It is like only the second storing is taken into account, and the last index of the OpPtrAccessChain used to compute the storing address is not taken into account.

I have also tried to force clspv to use ulong for the index of the two OpPtrAccessChain used to compute the storing address, but it had the same behavior.

@alan-baker , @kpet , any ideas?

rjodinchr avatar May 30 '23 13:05 rjodinchr

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

alan-baker avatar May 30 '23 19:05 alan-baker

Agree the shader looks correct after a quick scan. As tempting as it is to blame drivers, I am seeing failures across two Vulkan stacks and HW from 3 vendors (NVIDIA, Mesa/AMD, Mesa/Intel). clvk is definitely not allocating the memory properly, the code in main does not have some of the flags used in my original prototype (VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage flag on the buffer and corresponding VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT memory allocation flag). After fixing this the validation layers are happy but it does not affect what I'm reading back from memory. On an Intel A750, I'm seeing all zeros. On AMD HW, I'm seeing one of the two values (the float with the test unmodified, the int if I comment out the line storing the float) but 3 bytes off where it should be. I've tried to strengthen memory barriers after kernels, to no avail. This is going to be a fun one ...

kpet avatar May 30 '23 20:05 kpet

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

Yes I have tried that, the writing occurs in the right thread.

rjodinchr avatar May 31 '23 08:05 rjodinchr

I've got the test passing, we're not laying out the structure but all structures in the PhysicalStorageBuffer storage class must be explicitly laid out. I've hacked clspv to test but a proper solution will require more work. There might be cases where the same struct type is used to declare objects in storages classes where they must be explicitly laid out and in others where they must not but the layout decorations are applied to the type so we need multiple types. I vaguely remember that there's logic to deal with this in clspv already. Happy to hand over to @alan-baker or @rjodinchr for a full clspv solution.

kpet avatar May 31 '23 19:05 kpet

Here's the clspv hack I used for reference:

diff --git a/lib/SPIRVProducerPass.cpp b/lib/SPIRVProducerPass.cpp
index d39555e5..f1eb95bc 100644
--- a/lib/SPIRVProducerPass.cpp
+++ b/lib/SPIRVProducerPass.cpp
@@ -1918,9 +1918,9 @@ SPIRVID SPIRVProducerPassImpl::getSPIRVType(Type *Ty, bool needs_layout) {
     StructType *canonical = cast<StructType>(CanonicalType(STy));
     bool use_layout =
         (Option::SpvVersion() < SPIRVVersion::SPIRV_1_4) || needs_layout;
-    if (TypesNeedingLayout.idFor(STy) &&
+    if (/*TypesNeedingLayout.idFor(STy) &&
         (canonical == STy || !TypesNeedingLayout.idFor(canonical)) &&
-        use_layout) {
+        use_layout*/true) {
       for (unsigned MemberIdx = 0; MemberIdx < STy->getNumElements();
            MemberIdx++) {
         // Ops[0] = Structure Type ID

kpet avatar May 31 '23 19:05 kpet

I'll take a look, I would have thought this was handled already.

alan-baker avatar May 31 '23 19:05 alan-baker

I am working on the DATA ERROR issue. I have made some progress, but I need more time to have it fixed.

rjodinchr avatar Jun 30 '23 16:06 rjodinchr