DirectXShaderCompiler icon indicating copy to clipboard operation
DirectXShaderCompiler copied to clipboard

[SPIR-V] Problem loading struct with vk::RawBufferLoad.

Open kevyuu opened this issue 2 years ago • 7 comments

struct GPUObjDesc
{
	float3 test;
	float3 test2;
	float3 test3;
	soulsl::address vertex_address;         // Address of the Vertex buffer
	soulsl::address index_address;          // Address of the index buffer
	soulsl::address material_address;       // Address of the material buffer
	soulsl::address material_index_address;  // Address of the triangle material index buffer
};

GPUObjDesc gpu_obj_desc = vk::RawBufferLoad<GPUObjDesc>(push_constant.gpu_obj_buffer_address);
float3 test2 = vk::RawBufferLoad<float3>(push_constant.gpu_obj_buffer_address + sizeof(float3));
float3 test3 = vk::RawBufferLoad<float3>(push_constant.gpu_obj_buffer_address + 2 * sizeof(float3));

For some reason gpu_obj_desc.test2 and gpu_obj_desc.test3 return the value of gpu_obj_desc.test. But test2 and test3 return the correct value. So it behaves correctly if I load the float3 by offsetting manually. I compile the hlsl with -fvk-use-dx-layout and -fspv-target-env=vulkan1.3. I have bindless mechanism as well that use ByteAddressBuffer.Load<GPUObjDesc>, that mechanism work perfectly fine when I load this same struct.

kevyuu avatar Jan 10 '23 10:01 kevyuu

Hi, Thanks for the report! In order for us to reproduce the issue, we'd like a bit more information from your end:

  • dxc version (or commit hash is custom build)
  • a shader that compiles, we can directly try with, which reproduces the issue
  • full command-line (Example of a good issue summary: https://github.com/microsoft/DirectXShaderCompiler/issues/4911)

Tried to guess a shader using those things (minus soulsl fields since those are not defined here), but OpAccessChain looks fine.

Keenuts avatar Jan 16 '23 09:01 Keenuts

Actually I have look at the disassembly spir-v. And I wonder why it translates to SPIR-V differently from ByteAddressBuffer.Load<T>. In ByteAddressBuffer.Load<T>, dxc will load the member one by one using OpAccessChain (one OpAccessChain per uint) but vk::RawBufferLoad<T> will only use one OpLoad

 #define WORK_GROUP_SIZE_X 16
#define WORK_GROUP_SIZE_Y 16

[[vk::binding(0, 0)]] ByteAddressBuffer global_buffer_arr[];
[[vk::binding(0, 3)]] RWTexture2D<float4> global_rw_texture_2d_float4_arr[]; 

RWTexture2D<float4> get_rw_texture_2d_float4(uint descriptor_id) {
    return global_rw_texture_2d_float4_arr[descriptor_id];
}

struct GPUScene
{
    float3 sky_color;
    float3 cube_color;
    float4x4 projection_inverse;
    float4x4 view_inverse;
};

struct PushConstant
{
    uint64_t scene_gpu_address;
    uint scene_descriptor_id;
    uint output_texture_descriptor_id;
};

[[vk::push_constant]]
PushConstant push_constant;

[numthreads(WORK_GROUP_SIZE_X, WORK_GROUP_SIZE_Y, 1)]
void cs_main(uint3 dispatch_thread_id : SV_DispatchThreadID)
{
    GPUScene scene2 = global_buffer_arr[push_constant.scene_descriptor_id].Load<GPUScene>(0);
    GPUScene scene = vk::RawBufferLoad<GPUScene>(push_constant.scene_gpu_address, 4);
    float3 output_color = scene2.cube_color + scene.cube_color;
    RWTexture2D<float4> output_texture = get_rw_texture_2d_float4(push_constant.output_texture_descriptor_id);
    output_texture[uint2(0, 0)] = float4(output_color, 1.0f);
}

Resulting SPIR-V

; SPIR-V
; Version: 1.6
; Generator: Google spiregg; 0
; Bound: 240
; Schema: 0
               OpCapability Shader
               OpCapability RuntimeDescriptorArray
               OpCapability Int64
               OpCapability PhysicalStorageBufferAddresses
               OpExtension "SPV_EXT_descriptor_indexing"
               OpExtension "SPV_KHR_physical_storage_buffer"
               OpMemoryModel PhysicalStorageBuffer64 GLSL450
               OpEntryPoint GLCompute %cs_main "cs_main" %gl_GlobalInvocationID %global_buffer_arr %global_rw_texture_2d_float4_arr %push_constant
               OpExecutionMode %cs_main LocalSize 16 16 1
               OpSource HLSL 650
               OpName %type_ByteAddressBuffer "type.ByteAddressBuffer"
               OpName %global_buffer_arr "global_buffer_arr"
               OpName %type_2d_image "type.2d.image"
               OpName %global_rw_texture_2d_float4_arr "global_rw_texture_2d_float4_arr"
               OpName %type_PushConstant_PushConstant "type.PushConstant.PushConstant"
               OpMemberName %type_PushConstant_PushConstant 0 "scene_gpu_address"
               OpMemberName %type_PushConstant_PushConstant 1 "scene_descriptor_id"
               OpMemberName %type_PushConstant_PushConstant 2 "output_texture_descriptor_id"
               OpName %push_constant "push_constant"
               OpName %cs_main "cs_main"
               OpName %param_var_dispatch_thread_id "param.var.dispatch_thread_id"
               OpName %GPUScene "GPUScene"
               OpMemberName %GPUScene 0 "sky_color"
               OpMemberName %GPUScene 1 "cube_color"
               OpMemberName %GPUScene 2 "projection_inverse"
               OpMemberName %GPUScene 3 "view_inverse"
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %global_buffer_arr DescriptorSet 0
               OpDecorate %global_buffer_arr Binding 0
               OpDecorate %global_rw_texture_2d_float4_arr DescriptorSet 3
               OpDecorate %global_rw_texture_2d_float4_arr Binding 0
               OpDecorate %_runtimearr_uint ArrayStride 4
               OpMemberDecorate %type_ByteAddressBuffer 0 Offset 0
               OpMemberDecorate %type_ByteAddressBuffer 0 NonWritable
               OpDecorate %type_ByteAddressBuffer Block
               OpMemberDecorate %type_PushConstant_PushConstant 0 Offset 0
               OpMemberDecorate %type_PushConstant_PushConstant 1 Offset 8
               OpMemberDecorate %type_PushConstant_PushConstant 2 Offset 12
               OpDecorate %type_PushConstant_PushConstant Block
        %int = OpTypeInt 32 1
      %int_1 = OpConstant %int 1
       %uint = OpTypeInt 32 0
     %uint_0 = OpConstant %uint 0
     %uint_2 = OpConstant %uint 2
     %uint_1 = OpConstant %uint 1
     %uint_3 = OpConstant %uint 3
     %uint_6 = OpConstant %uint 6
    %uint_22 = OpConstant %uint 22
    %uint_38 = OpConstant %uint 38
      %int_0 = OpConstant %int 0
      %int_2 = OpConstant %int 2
      %float = OpTypeFloat 32
    %float_1 = OpConstant %float 1
     %v2uint = OpTypeVector %uint 2
         %27 = OpConstantComposite %v2uint %uint_0 %uint_0
%_runtimearr_uint = OpTypeRuntimeArray %uint
%type_ByteAddressBuffer = OpTypeStruct %_runtimearr_uint
%_runtimearr_type_ByteAddressBuffer = OpTypeRuntimeArray %type_ByteAddressBuffer
%_ptr_StorageBuffer__runtimearr_type_ByteAddressBuffer = OpTypePointer StorageBuffer %_runtimearr_type_ByteAddressBuffer
%type_2d_image = OpTypeImage %float 2D 2 0 0 2 Rgba32f
%_runtimearr_type_2d_image = OpTypeRuntimeArray %type_2d_image
%_ptr_UniformConstant__runtimearr_type_2d_image = OpTypePointer UniformConstant %_runtimearr_type_2d_image
      %ulong = OpTypeInt 64 0
%type_PushConstant_PushConstant = OpTypeStruct %ulong %uint %uint
%_ptr_PushConstant_type_PushConstant_PushConstant = OpTypePointer PushConstant %type_PushConstant_PushConstant
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
       %void = OpTypeVoid
         %37 = OpTypeFunction %void
%_ptr_Function_v3uint = OpTypePointer Function %v3uint
         %39 = OpTypeFunction %void %_ptr_Function_v3uint
    %v3float = OpTypeVector %float 3
    %v4float = OpTypeVector %float 4
%mat4v4float = OpTypeMatrix %v4float 4
   %GPUScene = OpTypeStruct %v3float %v3float %mat4v4float %mat4v4float
%_ptr_Function_GPUScene = OpTypePointer Function %GPUScene
%_ptr_Function_v3float = OpTypePointer Function %v3float
%_ptr_Function_type_2d_image = OpTypePointer Function %type_2d_image
%_ptr_Function_uint = OpTypePointer Function %uint
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
%_ptr_StorageBuffer_type_ByteAddressBuffer = OpTypePointer StorageBuffer %type_ByteAddressBuffer
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
%_ptr_PushConstant_ulong = OpTypePointer PushConstant %ulong
%_ptr_PhysicalStorageBuffer_GPUScene = OpTypePointer PhysicalStorageBuffer %GPUScene
         %52 = OpTypeFunction %type_2d_image %_ptr_Function_uint
%_ptr_UniformConstant_type_2d_image = OpTypePointer UniformConstant %type_2d_image
%global_buffer_arr = OpVariable %_ptr_StorageBuffer__runtimearr_type_ByteAddressBuffer StorageBuffer
%global_rw_texture_2d_float4_arr = OpVariable %_ptr_UniformConstant__runtimearr_type_2d_image UniformConstant
%push_constant = OpVariable %_ptr_PushConstant_type_PushConstant_PushConstant PushConstant
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
         %54 = OpUndef %v3float
         %55 = OpUndef %mat4v4float
     %uint_4 = OpConstant %uint 4
     %uint_5 = OpConstant %uint 5
     %uint_7 = OpConstant %uint 7
     %uint_8 = OpConstant %uint 8
     %uint_9 = OpConstant %uint 9
    %uint_10 = OpConstant %uint 10
    %uint_11 = OpConstant %uint 11
    %uint_12 = OpConstant %uint 12
    %uint_13 = OpConstant %uint 13
    %uint_14 = OpConstant %uint 14
    %uint_15 = OpConstant %uint 15
    %uint_16 = OpConstant %uint 16
    %uint_17 = OpConstant %uint 17
    %uint_18 = OpConstant %uint 18
    %uint_19 = OpConstant %uint 19
    %uint_20 = OpConstant %uint 20
    %uint_21 = OpConstant %uint 21
    %uint_23 = OpConstant %uint 23
    %uint_24 = OpConstant %uint 24
    %uint_25 = OpConstant %uint 25
    %uint_26 = OpConstant %uint 26
    %uint_27 = OpConstant %uint 27
    %uint_28 = OpConstant %uint 28
    %uint_29 = OpConstant %uint 29
    %uint_30 = OpConstant %uint 30
    %uint_31 = OpConstant %uint 31
    %uint_32 = OpConstant %uint 32
    %uint_33 = OpConstant %uint 33
    %uint_34 = OpConstant %uint 34
    %uint_35 = OpConstant %uint 35
    %uint_36 = OpConstant %uint 36
    %uint_37 = OpConstant %uint 37
    %cs_main = OpFunction %void None %37
         %88 = OpLabel
         %89 = OpVariable %_ptr_Function_v3float Function
         %90 = OpVariable %_ptr_Function_v3float Function
         %91 = OpVariable %_ptr_Function_type_2d_image Function
         %92 = OpVariable %_ptr_Function_v3float Function
         %93 = OpVariable %_ptr_Function_type_2d_image Function
         %94 = OpVariable %_ptr_Function_uint Function
%param_var_dispatch_thread_id = OpVariable %_ptr_Function_v3uint Function
         %95 = OpLoad %v3uint %gl_GlobalInvocationID
         %96 = OpAccessChain %_ptr_PushConstant_uint %push_constant %int_1
         %97 = OpLoad %uint %96
         %98 = OpAccessChain %_ptr_StorageBuffer_type_ByteAddressBuffer %global_buffer_arr %97
         %99 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_0
        %100 = OpLoad %uint %99
        %101 = OpBitcast %float %100
        %102 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_1
        %103 = OpLoad %uint %102
        %104 = OpBitcast %float %103
        %105 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_2
        %106 = OpLoad %uint %105
        %107 = OpBitcast %float %106
        %108 = OpCompositeConstruct %v3float %101 %104 %107
        %109 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_3
        %110 = OpLoad %uint %109
        %111 = OpBitcast %float %110
        %112 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_4
        %113 = OpLoad %uint %112
        %114 = OpBitcast %float %113
        %115 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_5
        %116 = OpLoad %uint %115
        %117 = OpBitcast %float %116
        %118 = OpCompositeConstruct %v3float %111 %114 %117
        %119 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_6
        %120 = OpLoad %uint %119
        %121 = OpBitcast %float %120
        %122 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_7
        %123 = OpLoad %uint %122
        %124 = OpBitcast %float %123
        %125 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_8
        %126 = OpLoad %uint %125
        %127 = OpBitcast %float %126
        %128 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_9
        %129 = OpLoad %uint %128
        %130 = OpBitcast %float %129
        %131 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_10
        %132 = OpLoad %uint %131
        %133 = OpBitcast %float %132
        %134 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_11
        %135 = OpLoad %uint %134
        %136 = OpBitcast %float %135
        %137 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_12
        %138 = OpLoad %uint %137
        %139 = OpBitcast %float %138
        %140 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_13
        %141 = OpLoad %uint %140
        %142 = OpBitcast %float %141
        %143 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_14
        %144 = OpLoad %uint %143
        %145 = OpBitcast %float %144
        %146 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_15
        %147 = OpLoad %uint %146
        %148 = OpBitcast %float %147
        %149 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_16
        %150 = OpLoad %uint %149
        %151 = OpBitcast %float %150
        %152 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_17
        %153 = OpLoad %uint %152
        %154 = OpBitcast %float %153
        %155 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_18
        %156 = OpLoad %uint %155
        %157 = OpBitcast %float %156
        %158 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_19
        %159 = OpLoad %uint %158
        %160 = OpBitcast %float %159
        %161 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_20
        %162 = OpLoad %uint %161
        %163 = OpBitcast %float %162
        %164 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_21
        %165 = OpLoad %uint %164
        %166 = OpBitcast %float %165
        %167 = OpCompositeConstruct %v4float %121 %133 %145 %157
        %168 = OpCompositeConstruct %v4float %124 %136 %148 %160
        %169 = OpCompositeConstruct %v4float %127 %139 %151 %163
        %170 = OpCompositeConstruct %v4float %130 %142 %154 %166
        %171 = OpCompositeConstruct %mat4v4float %167 %168 %169 %170
        %172 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_22
        %173 = OpLoad %uint %172
        %174 = OpBitcast %float %173
        %175 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_23
        %176 = OpLoad %uint %175
        %177 = OpBitcast %float %176
        %178 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_24
        %179 = OpLoad %uint %178
        %180 = OpBitcast %float %179
        %181 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_25
        %182 = OpLoad %uint %181
        %183 = OpBitcast %float %182
        %184 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_26
        %185 = OpLoad %uint %184
        %186 = OpBitcast %float %185
        %187 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_27
        %188 = OpLoad %uint %187
        %189 = OpBitcast %float %188
        %190 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_28
        %191 = OpLoad %uint %190
        %192 = OpBitcast %float %191
        %193 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_29
        %194 = OpLoad %uint %193
        %195 = OpBitcast %float %194
        %196 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_30
        %197 = OpLoad %uint %196
        %198 = OpBitcast %float %197
        %199 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_31
        %200 = OpLoad %uint %199
        %201 = OpBitcast %float %200
        %202 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_32
        %203 = OpLoad %uint %202
        %204 = OpBitcast %float %203
        %205 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_33
        %206 = OpLoad %uint %205
        %207 = OpBitcast %float %206
        %208 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_34
        %209 = OpLoad %uint %208
        %210 = OpBitcast %float %209
        %211 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_35
        %212 = OpLoad %uint %211
        %213 = OpBitcast %float %212
        %214 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_36
        %215 = OpLoad %uint %214
        %216 = OpBitcast %float %215
        %217 = OpAccessChain %_ptr_StorageBuffer_uint %global_buffer_arr %97 %uint_0 %uint_37
        %218 = OpLoad %uint %217
        %219 = OpBitcast %float %218
        %220 = OpCompositeConstruct %v4float %174 %186 %198 %210
        %221 = OpCompositeConstruct %v4float %177 %189 %201 %213
        %222 = OpCompositeConstruct %v4float %180 %192 %204 %216
        %223 = OpCompositeConstruct %v4float %183 %195 %207 %219
        %224 = OpCompositeConstruct %mat4v4float %220 %221 %222 %223
        %225 = OpCompositeConstruct %GPUScene %108 %118 %171 %224
               OpStore %90 %118
        %226 = OpAccessChain %_ptr_PushConstant_ulong %push_constant %int_0
        %227 = OpLoad %ulong %226
        %228 = OpBitcast %_ptr_PhysicalStorageBuffer_GPUScene %227
        %229 = OpLoad %GPUScene %228 Aligned 4
        %230 = OpCompositeExtract %v3float %229 1
               OpStore %89 %230
        %231 = OpFAdd %v3float %118 %230
               OpStore %92 %231
        %232 = OpAccessChain %_ptr_PushConstant_uint %push_constant %int_2
        %233 = OpLoad %uint %232
               OpStore %94 %233
        %234 = OpAccessChain %_ptr_UniformConstant_type_2d_image %global_rw_texture_2d_float4_arr %233
        %235 = OpLoad %type_2d_image %234
               OpStore %91 %235
               OpStore %93 %235
        %236 = OpCompositeExtract %float %231 0
        %237 = OpCompositeExtract %float %231 1
        %238 = OpCompositeExtract %float %231 2
        %239 = OpCompositeConstruct %v4float %236 %237 %238 %float_1
               OpImageWrite %235 %27 %239 None
               OpReturn
               OpFunctionEnd

Version: dxcompiler.dll: 1.7 - 1.7.2212.12 (8c9d92be7); dxil.dll: 1.7(101.7.2212.14) Command : ./dxc.exe -HV 2021 -spirv -fvk-use-dx-layout -fspv-target-env="vulkan1.3" -fspv-extension=SPV_EXT_descriptor_indexing -fspv-extension=SPV_KHR_physical_storage_buffer -E cs_main -T cs_6_5 .\test.hlsl -Fo .\test.spv -Fc .\test.ds %229 is scene and It does correctly extract the first element for %230(scene.cube_color) Will this method use the layout we indicate to compiler, for example if I request to use dxc_layout. Loading the member one by one by ourself will compile to similar code that use ByteAddressBuffer.Load, so I guess that's why it works.

Is there any reason, why it doesn't load the memory the same way ByteAddressBuffer.Load does?

kevyuu avatar Jan 16 '23 13:01 kevyuu

Buffer Reference is in my opinion one of the top feature of GLSL/Vulkan, and this vk::RawBufferLoad feature is really great. Unfortunately there is this bug with structures.

As an example, let's take this structure

struct TestBuffer
{
  vec3 a;
  vec3 b;
};

And fill TestBuffer[2] with the values as {{{1,2,3},{4,56}},{{7,8,9},{10,11,12}}}

In the shader, we can retrieve the values as such

  TestBuffer test = vk::RawBufferLoad < TestBuffer > (pushConst.bufAddress);
  printf("a: %.1f %.1f %.1f\n", test.a.x, test.a.y, test.a.z);
  printf("b: %.1f %.1f %.1f\n", test.b.x, test.b.y, test.b.z);

The result show the correct result for the first member, but the following one contains the same data as a.

 a: 1.0 2.0 3.0
 b: 1.0 2.0 3.0

Applying an offset will give again the right value for the first member, and copy to all other members.

TestBuffer test = vk::RawBufferLoad < TestBuffer > (pushConst.bufAddress + sizeof(TestBuffer));
...
 a: 7.0 8.0 9.0
 b: 7.0 8.0 9.0

Note: If the struct contains more members, they all have a copy of what was loaded for the first member. And if a following member is an integer, it is similar as doing asint() on the value.

The workaround, is to extract each member separately.

  float3 a = vk::RawBufferLoad < float3 > (pushConst.bufAddress + 0);
  float3 b = vk::RawBufferLoad < float3 > (pushConst.bufAddress + sizeof(float3));
...
 a: 1.0 2.0 3.0
 b: 4.0 5.0 6.0

But this is not always simple, especially when the structure have members that can be #ifdef out.

Extra, a very nice addition to vk::RawBufferLoad would be to support the [] operand, of some other ways to load the n element of the buffer.

mklefrancois avatar Jul 18 '23 14:07 mklefrancois

Its now broken in new and fun ways

https://godbolt.org/z/onYcs95bc

fatal error: generated SPIR-V is invalid: Structure id 7 decorated as Block for variable in PhysicalStorageBuffer storage class must follow relaxed storage buffer layout rules: member 0 is missing an Offset decoration
  %LineStyle = OpTypeStruct %float %_arr_uint_uint_1

if doesn't matter if I annotate my struct members with [[vk::offset(x)]] or [[vk::ext_decorate(35,x)]], I still get this

Here is a sample that shows the problem: https://godbolt.org/z/cbPnTaG45

Note that the "sizeof(float3)" is 12. This means that the offset for the individual loads are at 12 and 24. But the offsets in %GPUObjDesc_0 are 0, 16, and 32. So the values come from different memory locations.

From the description of the vk::RawBufferLoad, the memory layout of the struct is not specificed.

This is really bad because it makes it impossible to know the memory layout so you can fill in the data on the CPU side.

The only action I think we need to take is to determine which layout is being used, and update the documentation. EDIT: I am concerned that changing the layout at this point will break existing code. Since it is not well documented, I cannot say the current behaviour is incorrect.

It seems to change depending on the layout option used:

default: 16 DX: 16, GL: 16 scalar: 12

s-perron avatar Sep 29 '23 19:09 s-perron

This example shows that the byte address buffer uses different layouts than the rawbuffer loads: https://godbolt.org/z/nE4ns5WPe.

default: 16, DX: 12, GL: 16 scalar: 12

s-perron avatar Sep 29 '23 19:09 s-perron

Its now broken in new and fun ways

https://godbolt.org/z/onYcs95bc

fatal error: generated SPIR-V is invalid: Structure id 7 decorated as Block for variable in PhysicalStorageBuffer storage class must follow relaxed storage buffer layout rules: member 0 is missing an Offset decoration
  %LineStyle = OpTypeStruct %float %_arr_uint_uint_1

if doesn't matter if I annotate my struct members with [[vk::offset(x)]] or [[vk::ext_decorate(35,x)]], I still get this

This seems to have been fixed, but comes back if I launder the address through a bitcast (possible with prop 0011) now https://godbolt.org/z/TnEM3Ka5d

P.S. my Suggestion is to only allow RawBuffer and friends with -fvk-use-scalar-layout

As with other RawBuffer related issues, we will not be updating that feature. We will leave the layout as it for now.

However, we do need to carefully consider the layout for vk::BufferPointer. The current spec is too vague, and we will have to make it more detailed. The conversation in this issue should inform that conversation.

s-perron avatar Aug 23 '24 14:08 s-perron

I opened https://github.com/microsoft/hlsl-specs/issues/306 to follow up on the vk::BufferPointer layout rules.

s-perron avatar Aug 23 '24 14:08 s-perron

It also seems to have gotten fixed between 1.8.2405 and 2407