ncnn spir-v fastmath mode

As op absval Before

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 54
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

After

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 55
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpCapability FloatControls2
               OpExtension "SPV_KHR_float_controls2"
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main FPFastMathDefault %float %uint_458752
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
%uint_458752 = OpConstant %uint 458752
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

Aug 01 '25 06:08 futz12

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: futz12
:x: nihui
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Aug 01 '25 06:08 tencent-adm

Codecov Report

:x: Patch coverage is 81.90476% with 19 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 95.59%. Comparing base (a514cf5) to head (157ff17). :warning: Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/pipelinecache.cpp	33.33%	10 Missing :warning:
src/gpu.cpp	91.95%	7 Missing :warning:
src/pipeline.cpp	0.00%	2 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6223      +/-   ##
==========================================
- Coverage   95.89%   95.59%   -0.30%     
==========================================
  Files         837      838       +1     
  Lines      264994   265097     +103     
==========================================
- Hits       254105   253424     -681     
- Misses      10889    11673     +784

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Aug 01 '25 07:08 codecov-commenter

The binary size change of libncnn.so (bytes)

architecture	base size	pr size	difference
x86_64	15124728	15133360	+8632 :warning:
armhf	6155744	6160304	+4560 :warning:
aarch64	9453192	9453856	+664 :warning:

Aug 01 '25 07:08 github-actions[bot]

感谢你的工作，请将你在实现中的笔记和心得，遇到的困难和解决方法等，记录成文章，发表在discussion分区，这将作为知识总结 https://github.com/Tencent/ncnn/discussions

Thank you for your work. Please record your notes and experience in the implementation, difficulties encountered and solutions, etc. into an article and publish it in the discussion section. This will serve as a knowledge summary. https://github.com/Tencent/ncnn/discussions

Aug 21 '25 07:08 nihui