ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

spir-v fastmath mode

Open futz12 opened this issue 7 months ago • 4 comments

As op absval Before

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 54
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

After

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 55
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpCapability FloatControls2
               OpExtension "SPV_KHR_float_controls2"
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main FPFastMathDefault %float %uint_458752
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
%uint_458752 = OpConstant %uint 458752
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

futz12 avatar Aug 01 '25 06:08 futz12

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: futz12
:x: nihui
You have signed the CLA already but the status is still pending? Let us recheck it.

tencent-adm avatar Aug 01 '25 06:08 tencent-adm

Codecov Report

:x: Patch coverage is 81.90476% with 19 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 95.59%. Comparing base (a514cf5) to head (157ff17). :warning: Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/pipelinecache.cpp 33.33% 10 Missing :warning:
src/gpu.cpp 91.95% 7 Missing :warning:
src/pipeline.cpp 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6223      +/-   ##
==========================================
- Coverage   95.89%   95.59%   -0.30%     
==========================================
  Files         837      838       +1     
  Lines      264994   265097     +103     
==========================================
- Hits       254105   253424     -681     
- Misses      10889    11673     +784     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Aug 01 '25 07:08 codecov-commenter

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15124728 15133360 +8632 :warning:
armhf 6155744 6160304 +4560 :warning:
aarch64 9453192 9453856 +664 :warning:

github-actions[bot] avatar Aug 01 '25 07:08 github-actions[bot]

感谢你的工作,请将你在实现中的笔记和心得,遇到的困难和解决方法等,记录成文章,发表在discussion分区,这将作为知识总结 https://github.com/Tencent/ncnn/discussions

Thank you for your work. Please record your notes and experience in the implementation, difficulties encountered and solutions, etc. into an article and publish it in the discussion section. This will serve as a knowledge summary. https://github.com/Tencent/ncnn/discussions

nihui avatar Aug 21 '25 07:08 nihui