DirectXShaderCompiler icon indicating copy to clipboard operation
DirectXShaderCompiler copied to clipboard

[SPIR-V] Float16 capability getting emitted even if no Half arithmetic present

Open devshgraphicsprogramming opened this issue 1 month ago • 3 comments

Description

I'd expect if I only use 16bit storage and immediately OpFConvert to 32bit floats, I would not get the Vulkan shaderFloat16 a.k.a. Float16 SPIR-V capability emitted.

To quote the spec:

The capabilities StorageBuffer16BitAccess, UniformAndStorageBuffer16BitAccess, StoragePushConstant16, and StorageInputOutput16 do not generally add 16-bit operations. Rather, they add only the following specific abilities:

An OpTypePointer pointing to a 16-bit scalar, a 16-bit vector, or a composite containing a 16-bit member can be used as the result type of OpVariable, or OpAccessChain, or OpInBoundsAccessChain.

OpLoad can load 16-bit scalars, 16-bit vectors, and 16-bit matrices.

OpStore can store 16-bit scalars, 16-bit vectors, and 16-bit matrices.

OpCopyObject can be used for 16-bit scalars or composites containing 16-bit members.

16-bit scalars or 16-bit vectors can be used as operands to a width-only conversion instruction to another allowed type (OpFConvert, OpSConvert, or OpUConvert), and can be produced as results of a width-only conversion instruction from another allowed type.

A structure containing a 16-bit member can be an operand to OpArrayLength.

For context, I'm also not sure that StoragePushConstant16 should be emitted but bringing that up with the SPIR-V folks for now: https://github.com/KhronosGroup/SPIRV-Tools/issues/6435

Steps to Reproduce

struct PushConstants
{
    float16_t2 pairOfHalves;
};

[[vk::push_constant]] PushConstants pc;


[numthreads(1,1,1)]
[shader("compute")]
void main()
{
    float32_t2 promoted = pc.pairOfHalves;
    vk::RawBufferStore(0xdeadbeefull, promoted);
}

https://godbolt.org/z/8x8vEYndW

Actual Behavior

produces

               OpCapability Shader
               OpCapability Int64
               OpCapability StoragePushConstant16
               OpCapability Float16
               OpCapability PhysicalStorageBufferAddresses
               OpMemoryModel PhysicalStorageBuffer64 GLSL450
               OpEntryPoint GLCompute %main "main" %pc
               OpExecutionMode %main LocalSize 1 1 1
               OpMemberDecorate %type_PushConstant_PushConstants 0 Offset 0
               OpDecorate %type_PushConstant_PushConstants Block
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
      %ulong = OpTypeInt 64 0
%ulong_3735928559 = OpConstant %ulong 3735928559
       %half = OpTypeFloat 16
     %v2half = OpTypeVector %half 2
%type_PushConstant_PushConstants = OpTypeStruct %v2half
%_ptr_PushConstant_type_PushConstant_PushConstants = OpTypePointer PushConstant %type_PushConstant_PushConstants
       %void = OpTypeVoid
         %12 = OpTypeFunction %void
      %float = OpTypeFloat 32
    %v2float = OpTypeVector %float 2
%_ptr_PushConstant_v2half = OpTypePointer PushConstant %v2half
%_ptr_PhysicalStorageBuffer_v2float = OpTypePointer PhysicalStorageBuffer %v2float
         %pc = OpVariable %_ptr_PushConstant_type_PushConstant_PushConstants PushConstant
       %main = OpFunction %void None %12
         %17 = OpLabel
         %18 = OpAccessChain %_ptr_PushConstant_v2half %pc %int_0
         %19 = OpLoad %v2half %18
         %20 = OpFConvert %v2float %19
         %21 = OpBitcast %_ptr_PhysicalStorageBuffer_v2float %ulong_3735928559
               OpStore %21 %20 Aligned 4
               OpReturn
               OpFunctionEnd

Environment

  • DXC version
  • Host Operating System

Funny thing is that validator lets it pass with either Float16 or StoragePushConstant16 https://godbolt.org/z/eTGn3eeaG

The worrying part is that the validator does no validation whether StoragePushConstant16 should be present.

This is a trim_capability issue: For Float16, the pass looks if any OpTypeFloat with 16 as size exists, and for storage if such type is used. But it doesn't do cross checks as "if one is enough or are both required".

Do you have issues with devices supporting StoragePushConstant16 but not Float16?

Keenuts avatar Dec 02 '25 16:12 Keenuts

Its quite complicated, non of the AMD devices support 16bit push constants.

However in the related discussion (and internal one between Spencer Fricke of LunarG and @alan-baker ) nobody is even sure if 16bit access is required for a member at 32bit alignment thats actually 32bit (2 component vector of halves) and gets accessed as a whole. https://github.com/KhronosGroup/SPIRV-Tools/issues/6435#issuecomment-3582040970

My logic was, that by accessing the whole 32bits and 32bit offset wouldn't require 16bit storage capability, and not doing any arithmetic would not require the Float16 capability (its in the spec that you can convert between halves and full floats, load and store them without doing arithmetic, then you don't need Float16).

Apparently KHR_untyped_pointers will clean that up.