clvk Support for cl_mem_device_address

still WIP, sample works but test fails though more tests fail even on main

╭─pvelesko@cupcake ~/clvk/test-dev-buffer/build ‹deviceAddr●› 
╰─$ ./device_ptr_test
Platform 0 has 4 device(s):
        0: Intel(R) Graphics (RPL-S)
        1: AMD Radeon VII (RADV VEGA20)
        2: Intel(R) Arc(tm) A770 Graphics (DG2)
        3: llvmpipe (LLVM 15.0.7, 256 bits)
Select platform index: 0
Select device index: 0

Running kernel test with device address extension...
Device supports cl_ext_buffer_device_address extension
Successfully obtained device pointer: 0xfffffffefff80000
Computation successful
All tests completed successfully

Nov 20 '24 15:11 pvelesko

@kpet thank you for the review, I'll work on addressing your comments. Meanwhile, this is what I get when I try to execute an example:

warning: overriding the module target triple with spir-unknown-unknown
warning: Linking two modules of different data layouts: 'Unknown buffer' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1' whereas 'clvk-1ArlBH/source.bc' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-G1'

clspv: /space/pvelesko/clvk/external/clspv/third_party/llvm/llvm/lib/IR/Value.cpp:507: void llvm::Value::doRAUW(llvm::Value*, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Aborted (core dumped)

Error processing hip-spirv-58abf1.spv: Failed to build program

I'm not an LLVM expert but just wanted to check with you first before I investigate further.

Dec 04 '24 22:12 pvelesko

Can you share the kernel source and the command line? Everything will be in clvk's log, that you would get running with:

CLVK_LOG=4
CLVK_LOG_DEST=file:log.txt

then upload log.txt here

Dec 04 '24 22:12 rjodinchr

The kernel source is a HIP matrix multiplication @rjodinchr or do you mean the SPIR-V?

Dec 04 '24 22:12 pvelesko

The error message you have posted is about clspv not being able to generate the SPIR-V shader from the CL kernel. Please provide the log from clvk. It will contain everything we need to reproduce and fix the issue. Thanks

Dec 04 '24 22:12 rjodinchr

@rjodinchr log.txt

Dec 04 '24 22:12 pvelesko

Oh, I understand what you meant by:

or do you mean the SPIR-V?

This application is using OpenCL SPIR-V source. This was not clear to me as SPIR-V has 2 variant. The OpenCL one and the Vulkan one. While the goal of clvk is to compile whatever CL source it gets to Vulkan SPIR-V, it can take OpenCL SPIR-V as the input.

But the log does not contain the OpenCL SPIR-V inputs. Could you share the sources then (I see two calls to clCraeteProgramWithIL, so both will be needed).

Dec 05 '24 05:12 rjodinchr

Hmm only one got dumped for me:

; SPIR-V
; Version: 1.1
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 145
; Schema: 0
               OpCapability Addresses
               OpCapability Linkage
               OpCapability Kernel
               OpCapability Int64
               OpCapability Int8
          %1 = OpExtInstImport "OpenCL.std"
          %2 = OpExtInstImport "OpenCL.DebugInfo.100"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %80 "_Z12gpuMatrixMulPKfS0_Pfjjj" %__spirv_BuiltInWorkgroupSize %__spirv_BuiltInWorkgroupId %__spirv_BuiltInLocalInvocationId
         %89 = OpString "/space/pvelesko/chipStar/main/samples/0_MatrixMultiply/MatrixMultiply.cpp"
         %93 = OpString "gpuMatrixMul"
         %94 = OpString ""
         %97 = OpString "__hip_get_block_dim_x"
         %98 = OpString "/space/pvelesko/chipStar/main/include/hip/spirv_hip.hh"
        %101 = OpString "__get_x"
        %103 = OpString "__hip_get_block_idx_x"
        %106 = OpString "__hip_get_thread_idx_x"
        %109 = OpString "__hip_get_block_dim_y"
        %111 = OpString "__get_y"
        %113 = OpString "__hip_get_block_idx_y"
        %116 = OpString "__hip_get_thread_idx_y"
               OpSource OpenCL_C 200000
               OpName %__spirv_BuiltInWorkgroupSize "__spirv_BuiltInWorkgroupSize"
               OpName %__spirv_BuiltInWorkgroupId "__spirv_BuiltInWorkgroupId"
               OpName %__spirv_BuiltInLocalInvocationId "__spirv_BuiltInLocalInvocationId"
               OpName %A_coerce "A.coerce"
               OpName %B_coerce "B.coerce"
               OpName %C_coerce "C.coerce"
               OpName %M "M"
               OpName %N "N"
               OpName %K "K"
               OpName %entry "entry"
               OpName %for_body_lr_ph "for.body.lr.ph"
               OpName %for_body "for.body"
               OpName %for_cond_cleanup_loopexit "for.cond.cleanup.loopexit"
               OpName %for_cond_cleanup "for.cond.cleanup"
               OpName %call_i36 "call.i36"
               OpName %conv_i "conv.i"
               OpName %call_i "call.i"
               OpName %conv_i37 "conv.i37"
               OpName %mul "mul"
               OpName %call_i38 "call.i38"
               OpName %conv_i39 "conv.i39"
               OpName %add "add"
               OpName %call_i40 "call.i40"
               OpName %conv_i41 "conv.i41"
               OpName %call_i42 "call.i42"
               OpName %conv_i43 "conv.i43"
               OpName %mul8 "mul8"
               OpName %call_i44 "call.i44"
               OpName %conv_i45 "conv.i45"
               OpName %add10 "add10"
               OpName %cmp46_not "cmp46.not"
               OpName %mul13 "mul13"
               OpName %add18 "add18"
               OpName %acc_0_lcssa "acc.0.lcssa"
               OpName %mul19 "mul19"
               OpName %add20 "add20"
               OpName %idxprom21 "idxprom21"
               OpName %arrayidx22 "arrayidx22"
               OpName %inc "inc"
               OpName %k_048 "k.048"
               OpName %acc_047 "acc.047"
               OpName %mul11 "mul11"
               OpName %add12 "add12"
               OpName %idxprom "idxprom"
               OpName %arrayidx "arrayidx"
               OpName %add14 "add14"
               OpName %idxprom15 "idxprom15"
               OpName %arrayidx16 "arrayidx16"
               OpName %mul17 "mul17"
               OpName %cmp "cmp"
               OpName %A_coerce_0 "A.coerce"
               OpName %B_coerce_0 "B.coerce"
               OpName %C_coerce_0 "C.coerce"
               OpName %M_0 "M"
               OpName %N_0 "N"
               OpName %K_0 "K"
               OpModuleProcessed "Debug info producer: clang version 18.1.5 (https://github.com/CHIP-SPV/llvm-project.git 5c39d7d1aa6e54a9c8df41002d419c398ec8830c)"
               OpDecorate %__spirv_BuiltInWorkgroupSize LinkageAttributes "__spirv_BuiltInWorkgroupSize" Import
               OpDecorate %__spirv_BuiltInWorkgroupSize Constant
               OpDecorate %__spirv_BuiltInWorkgroupSize BuiltIn WorkgroupSize
               OpDecorate %__spirv_BuiltInWorkgroupId LinkageAttributes "__spirv_BuiltInWorkgroupId" Import
               OpDecorate %__spirv_BuiltInWorkgroupId Constant
               OpDecorate %__spirv_BuiltInWorkgroupId BuiltIn WorkgroupId
               OpDecorate %__spirv_BuiltInLocalInvocationId LinkageAttributes "__spirv_BuiltInLocalInvocationId" Import
               OpDecorate %__spirv_BuiltInLocalInvocationId Constant
               OpDecorate %__spirv_BuiltInLocalInvocationId BuiltIn LocalInvocationId
               OpDecorate %A_coerce FuncParamAttr NoAlias
               OpDecorate %A_coerce FuncParamAttr NoCapture
               OpDecorate %A_coerce FuncParamAttr NoWrite
               OpDecorate %B_coerce FuncParamAttr NoAlias
               OpDecorate %B_coerce FuncParamAttr NoCapture
               OpDecorate %B_coerce FuncParamAttr NoWrite
               OpDecorate %C_coerce FuncParamAttr NoAlias
               OpDecorate %C_coerce FuncParamAttr NoCapture
               OpDecorate %A_coerce_0 FuncParamAttr NoAlias
               OpDecorate %A_coerce_0 FuncParamAttr NoCapture
               OpDecorate %A_coerce_0 FuncParamAttr NoWrite
               OpDecorate %B_coerce_0 FuncParamAttr NoAlias
               OpDecorate %B_coerce_0 FuncParamAttr NoCapture
               OpDecorate %B_coerce_0 FuncParamAttr NoWrite
               OpDecorate %C_coerce_0 FuncParamAttr NoAlias
               OpDecorate %C_coerce_0 FuncParamAttr NoCapture
      %uchar = OpTypeInt 8 0
      %ulong = OpTypeInt 64 0
       %uint = OpTypeInt 32 0
    %uchar_1 = OpConstant %uchar 1
     %uint_0 = OpConstant %uint 0
     %uint_1 = OpConstant %uint 1
%_ptr_CrossWorkgroup_uchar = OpTypePointer CrossWorkgroup %uchar
    %v3ulong = OpTypeVector %ulong 3
%_ptr_Input_v3ulong = OpTypePointer Input %v3ulong
       %void = OpTypeVoid
      %float = OpTypeFloat 32
%_ptr_CrossWorkgroup_float = OpTypePointer CrossWorkgroup %float
         %17 = OpTypeFunction %void %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %uint %uint %uint
       %bool = OpTypeBool
%__spirv_BuiltInWorkgroupSize = OpVariable %_ptr_Input_v3ulong Input
%__spirv_BuiltInWorkgroupId = OpVariable %_ptr_Input_v3ulong Input
%__spirv_BuiltInLocalInvocationId = OpVariable %_ptr_Input_v3ulong Input
    %float_0 = OpConstant %float 0
         %90 = OpExtInst %void %2 DebugSource %89
         %91 = OpExtInst %void %2 DebugCompilationUnit 65536 4 %90 CPP_for_OpenCL
         %92 = OpExtInst %void %2 DebugTypeFunction None %void
         %95 = OpExtInst %void %2 DebugInfoNone
         %96 = OpExtInst %void %2 DebugFunction %93 %92 %90 83 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 87 %18 %95
         %99 = OpExtInst %void %2 DebugSource %98
        %100 = OpExtInst %void %2 DebugFunction %97 %92 %99 82 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 82 %95 %95
        %102 = OpExtInst %void %2 DebugFunction %101 %92 %99 108 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 108 %95 %95
        %104 = OpExtInst %void %2 DebugFunction %103 %92 %99 77 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 77 %95 %95
        %105 = OpExtInst %void %2 DebugFunction %101 %92 %99 102 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 102 %95 %95
        %107 = OpExtInst %void %2 DebugFunction %106 %92 %99 72 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 72 %95 %95
        %108 = OpExtInst %void %2 DebugFunction %101 %92 %99 96 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 96 %95 %95
        %110 = OpExtInst %void %2 DebugFunction %109 %92 %99 83 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 83 %95 %95
        %112 = OpExtInst %void %2 DebugFunction %111 %92 %99 109 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 109 %95 %95
        %114 = OpExtInst %void %2 DebugFunction %113 %92 %99 78 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 78 %95 %95
        %115 = OpExtInst %void %2 DebugFunction %111 %92 %99 103 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 103 %95 %95
        %117 = OpExtInst %void %2 DebugFunction %116 %92 %99 73 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 73 %95 %95
        %118 = OpExtInst %void %2 DebugFunction %111 %92 %99 97 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 97 %95 %95
        %119 = OpExtInst %void %2 DebugInlinedAt 90 %96
        %120 = OpExtInst %void %2 DebugInlinedAt 108 %102 %119
        %122 = OpExtInst %void %2 DebugInlinedAt 90 %96
        %123 = OpExtInst %void %2 DebugInlinedAt 102 %105 %122
        %126 = OpExtInst %void %2 DebugInlinedAt 90 %96
        %127 = OpExtInst %void %2 DebugInlinedAt 96 %108 %126
        %130 = OpExtInst %void %2 DebugInlinedAt 92 %96
        %131 = OpExtInst %void %2 DebugInlinedAt 109 %112 %130
        %133 = OpExtInst %void %2 DebugInlinedAt 92 %96
        %134 = OpExtInst %void %2 DebugInlinedAt 103 %115 %133
        %137 = OpExtInst %void %2 DebugInlinedAt 92 %96
        %138 = OpExtInst %void %2 DebugInlinedAt 97 %118 %137
         %18 = OpFunction %void None %17
   %A_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
   %B_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
   %C_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
          %M = OpFunctionParameter %uint
          %N = OpFunctionParameter %uint
          %K = OpFunctionParameter %uint
      %entry = OpLabel
        %121 = OpExtInst %void %2 DebugScope %100 %120
               OpLine %98 82 50
         %30 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupSize Aligned 32
   %call_i36 = OpCompositeExtract %ulong %30 0
     %conv_i = OpUConvert %uint %call_i36
        %124 = OpExtInst %void %2 DebugScope %104 %123
               OpLine %98 77 50
         %33 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32
     %call_i = OpCompositeExtract %ulong %33 0
   %conv_i37 = OpUConvert %uint %call_i
        %125 = OpExtInst %void %2 DebugScope %96
               OpLine %89 90 21
        %mul = OpIMul %uint %conv_i37 %conv_i
        %128 = OpExtInst %void %2 DebugScope %107 %127
               OpLine %98 72 51
         %37 = OpLoad %v3ulong %__spirv_BuiltInLocalInvocationId Aligned 32
   %call_i38 = OpCompositeExtract %ulong %37 0
   %conv_i39 = OpUConvert %uint %call_i38
        %129 = OpExtInst %void %2 DebugScope %96
               OpLine %89 90 37
        %add = OpIAdd %uint %mul %conv_i39
        %132 = OpExtInst %void %2 DebugScope %110 %131
               OpLine %98 83 50
         %41 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupSize Aligned 32
   %call_i40 = OpCompositeExtract %ulong %41 1
   %conv_i41 = OpUConvert %uint %call_i40
        %135 = OpExtInst %void %2 DebugScope %114 %134
               OpLine %98 78 50
         %44 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32
   %call_i42 = OpCompositeExtract %ulong %44 1
   %conv_i43 = OpUConvert %uint %call_i42
        %136 = OpExtInst %void %2 DebugScope %96
               OpLine %89 92 21
       %mul8 = OpIMul %uint %conv_i43 %conv_i41
        %139 = OpExtInst %void %2 DebugScope %117 %138
               OpLine %98 73 51
         %48 = OpLoad %v3ulong %__spirv_BuiltInLocalInvocationId Aligned 32
   %call_i44 = OpCompositeExtract %ulong %48 1
   %conv_i45 = OpUConvert %uint %call_i44
        %140 = OpExtInst %void %2 DebugScope %96
               OpLine %89 92 37
      %add10 = OpIAdd %uint %mul8 %conv_i45
               OpLine %89 96 22
  %cmp46_not = OpIEqual %bool %K %uint_0
               OpLine %89 96 3
               OpBranchConditional %cmp46_not %for_cond_cleanup %for_body_lr_ph
%for_body_lr_ph = OpLabel
      %mul13 = OpIMul %uint %add10 %K
        %141 = OpExtInst %void %2 DebugScope %96
               OpLine %89 96 3
               OpBranch %for_body
   %for_body = OpLabel
      %k_048 = OpPhi %uint %uint_0 %for_body_lr_ph %inc %for_body
    %acc_047 = OpPhi %float %float_0 %for_body_lr_ph %add18 %for_body
        %144 = OpExtInst %void %2 DebugScope %96
               OpLine %89 100 16
      %mul11 = OpIMul %uint %k_048 %M
               OpLine %89 100 20
      %add12 = OpIAdd %uint %mul11 %add
               OpLine %89 100 12
    %idxprom = OpUConvert %ulong %add12
   %arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %A_coerce %idxprom
         %70 = OpLoad %float %arrayidx Aligned 4
               OpLine %89 100 51
      %add14 = OpIAdd %uint %k_048 %mul13
               OpLine %89 100 35
  %idxprom15 = OpUConvert %ulong %add14
 %arrayidx16 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %B_coerce %idxprom15
         %74 = OpLoad %float %arrayidx16 Aligned 4
               OpLine %89 100 33
      %mul17 = OpFMul %float %70 %74
               OpLine %89 100 9
      %add18 = OpFAdd %float %acc_047 %mul17
               OpLine %89 96 28
        %inc = OpIAdd %uint %k_048 %uint_1
               OpLine %89 96 22
        %cmp = OpULessThan %bool %inc %K
               OpLine %89 96 3
               OpBranchConditional %cmp %for_body %for_cond_cleanup_loopexit
%for_cond_cleanup_loopexit = OpLabel
        %142 = OpExtInst %void %2 DebugScope %96
               OpLine %89 105 15
               OpBranch %for_cond_cleanup
%for_cond_cleanup = OpLabel
        %143 = OpExtInst %void %2 DebugScope %96
               OpLine %89 0 0
%acc_0_lcssa = OpPhi %float %float_0 %entry %add18 %for_cond_cleanup_loopexit
               OpLine %89 105 15
      %mul19 = OpIMul %uint %add10 %M
               OpLine %89 105 19
      %add20 = OpIAdd %uint %mul19 %add
               OpLine %89 105 3
  %idxprom21 = OpUConvert %ulong %add20
 %arrayidx22 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %C_coerce %idxprom21
               OpLine %89 105 32
               OpStore %arrayidx22 %acc_0_lcssa Aligned 4
               OpLine %89 106 1
               OpReturn
               OpFunctionEnd
         %80 = OpFunction %void None %17
 %A_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
 %B_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
 %C_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
        %M_0 = OpFunctionParameter %uint
        %N_0 = OpFunctionParameter %uint
        %K_0 = OpFunctionParameter %uint
         %87 = OpLabel
         %88 = OpFunctionCall %void %18 %A_coerce_0 %B_coerce_0 %C_coerce_0 %M_0 %N_0 %K_0
               OpReturn
               OpFunctionEnd

Dec 05 '24 09:12 pvelesko

I'll try to reproduce with that one then. Thank you

Dec 05 '24 09:12 rjodinchr

I'm not able to reproduce. I will need even more assets I think. Could you run with:

CLVK_LOG=4
CLVK_LOG_DEST=file:log.txt
CLVK_KEEP_TEMPORARIES=1

That should produce a log.txt file as well as a bunch of clvk-XXXXX folders. Could you make a archive with all of them and upload that here please?

Dec 05 '24 12:12 rjodinchr

╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable› 
╰─$ ./tools/opencl-spirv-compiler/opencl-spirv-compiler ./MatrixMul.spv                                                                                       130 ↵
Build failed. Log:
warning: overriding the module target triple with spir-unknown-unknown
warning: Linking two modules of different data layouts: 'Unknown buffer' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1' whereas 'clvk-9YdtEh/source.bc' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-G1'

clspv: /space/pvelesko/clvk/external/clspv/third_party/llvm/llvm/lib/IR/Value.cpp:507: void llvm::Value::doRAUW(llvm::Value*, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Aborted (core dumped)

Error processing ./MatrixMul.spv: Failed to build program

clvkRepro.zip

Dec 05 '24 13:12 pvelesko

So the issue is that clvk does not realize that the SPIR-V source is using OpMemoryModel Physical64 OpenCL, thus it does not give the proper arguments to clspv.

We can force clvk anyway by running it with CLVK_SPIR_ARCH=spir64, but then I got the following error:

error: 91: [VUID-StandaloneSpirv-OpVariable-04651] OpVariable, <id> '4[%4]', has a disallowed initializer & storage class combination.
From Vulkan spec:
Variable declarations that include initializers must have one of the following storage classes: Output, Private, Function or Workgroup
  %4 = OpVariable %_ptr_PhysicalStorageBuffer_uchar PhysicalStorageBuffer %uchar_1

This is because we have the following line in the OpenCL SPIR-V kernel:

%__chip_module_has_no_IGBAs = OpVariable %_ptr_CrossWorkgroup_uchar CrossWorkgroup %uchar_1

Which gets translated into the following LLVM IR (by llvm-spirv):

@__chip_module_has_no_IGBAs = addrspace(1) constant i8 1

Note that it is on addrspace(1) which is the global addrspace, not the constant one (2). I need to check, but I think clspv does not support such global variables.

Removing every trace of %__chip_module_has_no_IGBAs from MatrixMul.spv.txt, assembling back to MatrixMul.spv and then running the program succeeded locally:

$ CLVK_SPIRV_ARCH=spir64 ./build/opencl-spirv-compiler ./MatrixMul.spv
Successfully compiled ./MatrixMul.spv to MatrixMul_device.bin

MatrixMul_device.zip

Dec 05 '24 14:12 rjodinchr

__chip_module_has_no_IGBAs is used for optimization and we can generate SPIR-V without it but generally speaking we do need it in global space since we r/w to it. Is the main issue here that it has an initializer or that that it's in global space?

Dec 05 '24 14:12 pvelesko

Disabled it, and I was able to run a HIP example on Vulkan!

╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●› 
╰─$ CLVK_SPIRV_ARCH=spir64 CHIP_LOGLEVEL=info ./samples/0_MatrixMultiply/MatrixMultiply
CHIP info [TID 418282] [1733410211.090988490] : CHIP_PLATFORM=0
CHIP info [TID 418282] [1733410211.091204631] : CHIP_DEVICE_TYPE=gpu
CHIP info [TID 418282] [1733410211.091225923] : CHIP_DEVICE=0
CHIP info [TID 418282] [1733410211.091261683] : CHIP_BE=opencl
CHIP info [TID 418282] [1733410211.091282034] : CHIP_DUMP_SPIRV=off
CHIP info [TID 418282] [1733410211.091313796] : CHIP_JIT_FLAGS_OVERRIDE=
CHIP info [TID 418282] [1733410211.091328992] : CHIP_L0_COLLECT_EVENTS_TIMEOUT=0
CHIP info [TID 418282] [1733410211.091367973] : CHIP_L0_EVENT_TIMEOUT=0
CHIP info [TID 418282] [1733410211.091383225] : CHIP_SKIP_UNINIT=off
CHIP info [TID 418282] [1733410211.091399049] : CHIP_LAZY_JIT=on
CHIP info [TID 418282] [1733410211.091413432] : CHIP_OCL_DISABLE_QUEUE_PROFILING=off
CHIP info [TID 418282] [1733410211.091429032] : CHIP_OCL_USE_ALLOC_STRATEGY=off
CHIP info [TID 418282] [1733410211.091454056] : CHIP_MODULE_CACHE_DIR=/space/pvelesko/.cache/chipStar
CHIP info [TID 418282] [1733410211.237711581] : OpenCL Devices of type gpu with SPIR-V_1 support:
Intel(R) Graphics (RPL-S)  is supported.

Device name Intel(R) Graphics (RPL-S)
CHIP info [TID 418282] [1733410211.483975250] : clProgramBuild took 0.00377717 seconds
CHIP info [TID 418282] [1733410211.484163837] : Loaded from cache, kernel compilation took 0.00411241 seconds
CHIP info [TID 418282] [1733410211.484184029] : Module compilation took 4133 microseconds
Running 1 iterations 
hipLaunchKernel 0 time taken: 122.767
hipLaunchKernel BEST TIME: 122.767
GPU real time taken(ms): 126.033
matrixMultiplyCPUReference time taken(ms): 4175.84
Verification PASSED!

Dec 05 '24 14:12 pvelesko

╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●› 
╰─$ ctest -R cuda --output-on-failure                                                                                                                                                                                                         8 ↵
Test project /space/pvelesko/chipStar/clvk-enable/build
      Start 1285: cucc-include-cuda-runtime-twice
 1/28 Test #1285: cucc-include-cuda-runtime-twice ...   Passed    0.19 sec
      Start 1386: cuda-asyncAPI
 2/28 Test #1386: cuda-asyncAPI .....................   Passed    0.73 sec
      Start 1387: cuda-lambda
 3/28 Test #1387: cuda-lambda .......................Subprocess aborted***Exception:   0.50 sec
CHIP error [TID 603321] [1733411241.287751572] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 603321] [1733411241.287990128] : Caught Error: hipErrorNotInitialized
cuda-lambda: /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/0_Simple/lambda/lambda.cu:23: int main(): Assertion `OutH == 1' failed.

      Start 1388: cuda-matrixMul
 4/28 Test #1388: cuda-matrixMul ....................   Passed    0.69 sec
      Start 1389: cuda-template
 5/28 Test #1389: cuda-template .....................   Passed    0.57 sec
      Start 1390: cuda-vectorAdd
 6/28 Test #1390: cuda-vectorAdd ....................   Passed    0.57 sec
      Start 1391: cuda-clock
 7/28 Test #1391: cuda-clock ........................   Passed    0.91 sec
      Start 1392: cuda-cppIntegration
 8/28 Test #1392: cuda-cppIntegration ...............   Passed    0.57 sec
      Start 1393: cuda-simplePrintf
 9/28 Test #1393: cuda-simplePrintf .................   Passed    0.92 sec
      Start 1394: cuda-simpleAtomicIntrinsics
10/28 Test #1394: cuda-simpleAtomicIntrinsics .......***Failed    0.97 sec
CHIP error [TID 603918] [1733411246.073723476] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 603918] [1733411246.073966304] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/0_Simple/simpleAtomicIntrinsics/simpleAtomicIntrinsics.cu(126) : getLastCudaError() CUDA error : Kernel execution failed : (3) hipErrorNotInitialized.
simpleAtomicIntrinsics starting...
MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

> GPU device has 1 Multi-Processors, SM 2.0 compute capabilities


      Start 1395: cuda-simpleTemplates
11/28 Test #1395: cuda-simpleTemplates ..............   Passed    0.57 sec
      Start 1396: cuda-simpleCallback
12/28 Test #1396: cuda-simpleCallback ...............   Passed    0.58 sec
      Start 1397: cuda-bandwidthTest
13/28 Test #1397: cuda-bandwidthTest ................   Passed    1.86 sec
      Start 1398: cuda-deviceQuery
14/28 Test #1398: cuda-deviceQuery ..................   Passed    0.58 sec
      Start 1399: cuda-convolutionSeparable
15/28 Test #1399: cuda-convolutionSeparable .........***Failed    3.77 sec
CHIP error [TID 604365] [1733411251.938588085] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 604365] [1733411251.938794831] : Caught Error: hipErrorNotInitialized
CHIP error [TID 604365] [1733411253.437853597] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 604365] [1733411253.437948768] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/2_Graphics/convolutionSeparable/convolutionSeparable.cu(120) : getLastCudaError() CUDA error : convolutionRowsKernel() execution failed
 : (3) hipErrorNotInitialized.
[/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-convolutionSeparable] - Starting...
MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Image Width x Height = 3072 x 3072

Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...


      Start 1400: cuda-dwtHaar1D
16/28 Test #1400: cuda-dwtHaar1D ....................   Passed    0.58 sec
      Start 1401: cuda-histogram
17/28 Test #1401: cuda-histogram ....................   Passed    2.28 sec
      Start 1402: cuda-binomialoptions
18/28 Test #1402: cuda-binomialoptions ..............***Failed    1.46 sec
CHIP error [TID 604670] [1733411257.766337544] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 604670] [1733411257.766585260] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/4_Finance/binomialOptions/binomialOptions_kernel.cu:144 code=3(hipErrorNotInitialized) "cudaMemcpyToSymbol(d_OptionData, h_OptionData, optN * sizeof(__TOptionData))" 
[/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-binomialoptions] - Starting...
MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Generating input data...
Running GPU binomial tree...

      Start 1403: cuda-blackscholes
19/28 Test #1403: cuda-blackscholes .................   Passed    1.15 sec
      Start 1404: cuda-qrng
20/28 Test #1404: cuda-qrng .........................***Failed    1.31 sec
CHIP error [TID 604858] [1733411260.220601360] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 604858] [1733411260.220850035] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/4_Finance/quasirandomGenerator/quasirandomGenerator_kernel.cu:67 code=3(hipErrorNotInitialized) "cudaMemcpyToSymbol( c_Table, tableCPU, QRNG_DIMENSIONS * QRNG_RESOLUTION * sizeof(unsigned int) )" 
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-qrng Starting...

MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Allocating GPU memory...
Allocating CPU memory...
Initializing QRNG tables...


      Start 1405: cuda-mergesort
21/28 Test #1405: cuda-mergesort ....................***Failed    2.32 sec
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-mergesort Starting...

MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Allocating and initializing host arrays...

Allocating and initializing CUDA arrays...

Initializing GPU merge sort...
Running GPU merge sort...
error: 207: [VUID-StandaloneSpirv-OpVariable-04734] OpVariable, <id> '16[%16]', initializers are limited to OpConstantNull in Workgroup storage class
  %16 = OpVariable %_ptr_Workgroup_uint Workgroup %uint_0

CHIP error [TID 604945] [1733411262.543079372] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 604945] [1733411262.543309720] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/mergeSort/mergeSort.cu(179) : getLastCudaError() CUDA error : mergeSortShared<1><<<>>> failed
 : (3) hipErrorNotInitialized.

      Start 1406: cuda-scalarprod
22/28 Test #1406: cuda-scalarprod ...................   Passed    0.62 sec
      Start 1407: cuda-scan
23/28 Test #1407: cuda-scan .........................***Failed    1.31 sec
CHIP error [TID 605142] [1733411264.471347575] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 605142] [1733411264.471579465] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/scan/scan.cu(232) : getLastCudaError() CUDA error : scanExclusiveShared() execution FAILED
 : (3) hipErrorNotInitialized.
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-scan Starting...

MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Initializing CUDA-C scan...

*** Running GPU scan for short arrays (100 identical iterations)...

Running scan for 4 elements (1703936 arrays)...

      Start 1408: cuda-sortnet
24/28 Test #1408: cuda-sortnet ......................   Passed    3.68 sec
      Start 1409: cuda-FDTD3d
25/28 Test #1409: cuda-FDTD3d .......................***Failed    9.41 sec
CHIP error [TID 605330] [1733411277.538024085] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 605330] [1733411277.538260836] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/FDTD3d/FDTD3dGPU.cu:113 code=3(hipErrorNotInitialized) "cudaFuncGetAttributes(&funcAttrib, FiniteDifferencesKernel)" 
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-FDTD3d Starting...

Set-up, based upon target device GMEM size...
 getTargetDeviceGlobalMemSize
 cudaGetDeviceCount
MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

 cudaGetDeviceProperties
 generateRandomData

FDTD on 376 x 376 x 376 volume with symmetric filter radius 4 for 5 timesteps...

fdtdReference...
 calloc intermediate
 Host FDTD loop
        t = 0
        t = 1
        t = 2
        t = 3
        t = 4

fdtdReference complete
fdtdGPU...
MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0


      Start 1410: cuda-sobolqrng
26/28 Test #1410: cuda-sobolqrng ....................   Passed    0.69 sec
      Start 1411: cuda-reduction
27/28 Test #1411: cuda-reduction ....................***Failed   24.47 sec
CHIP error [TID 605528] [1733411291.095725620] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 605528] [1733411291.095991100] : Caught Error: hipErrorNotInitialized
CHIP error [TID 605528] [1733411302.720568772] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 605528] [1733411302.720723338] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/reduction/reduction.cpp(294) : getLastCudaError() CUDA error : Kernel execution failed : (3) hipErrorNotInitialized.
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-reduction Starting...

MapSMtoCores for SM 2.0 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0

Using Device 0: Intel(R) Graphics (RPL-S)

Reducing array of type int

16777216 elements
256 threads (max)
64 blocks


      Start 1412: cuda-fastwalsh
28/28 Test #1412: cuda-fastwalsh ....................   Passed    3.61 sec

68% tests passed, 9 tests failed out of 28

Total Test time (real) =  66.93 sec

The following tests FAILED:
        1387 - cuda-lambda (Subprocess aborted)
        1394 - cuda-simpleAtomicIntrinsics (Failed)
        1399 - cuda-convolutionSeparable (Failed)
        1402 - cuda-binomialoptions (Failed)
        1404 - cuda-qrng (Failed)
        1405 - cuda-mergesort (Failed)
        1407 - cuda-scan (Failed)
        1409 - cuda-FDTD3d (Failed)
        1411 - cuda-reduction (Failed)
Errors while running CTest
╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●›

Dec 05 '24 15:12 pvelesko

@pvelesko The extension specification has now been finalised and integrated into the main OpenCL spec and we also have CTS tests. Are you planning to pick up this PR at some point in the future? It's fine if you're not but it would be helpful to know so someone else could pick it up (though I'm not aware of anyone dying to do it at this time :)).

Mar 30 '25 00:03 kpet

I will

Mar 30 '25 05:03 pvelesko

Support for cl_mem_device_address_EXT