Support for cl_mem_device_address_EXT
still WIP, sample works but test fails though more tests fail even on main
╭─pvelesko@cupcake ~/clvk/test-dev-buffer/build ‹deviceAddr●›
╰─$ ./device_ptr_test
Platform 0 has 4 device(s):
0: Intel(R) Graphics (RPL-S)
1: AMD Radeon VII (RADV VEGA20)
2: Intel(R) Arc(tm) A770 Graphics (DG2)
3: llvmpipe (LLVM 15.0.7, 256 bits)
Select platform index: 0
Select device index: 0
Running kernel test with device address extension...
Device supports cl_ext_buffer_device_address extension
Successfully obtained device pointer: 0xfffffffefff80000
Computation successful
All tests completed successfully
@kpet thank you for the review, I'll work on addressing your comments. Meanwhile, this is what I get when I try to execute an example:
warning: overriding the module target triple with spir-unknown-unknown
warning: Linking two modules of different data layouts: 'Unknown buffer' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1' whereas 'clvk-1ArlBH/source.bc' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-G1'
clspv: /space/pvelesko/clvk/external/clspv/third_party/llvm/llvm/lib/IR/Value.cpp:507: void llvm::Value::doRAUW(llvm::Value*, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Aborted (core dumped)
Error processing hip-spirv-58abf1.spv: Failed to build program
I'm not an LLVM expert but just wanted to check with you first before I investigate further.
Can you share the kernel source and the command line? Everything will be in clvk's log, that you would get running with:
CLVK_LOG=4
CLVK_LOG_DEST=file:log.txt
then upload log.txt here
The kernel source is a HIP matrix multiplication @rjodinchr or do you mean the SPIR-V?
The error message you have posted is about clspv not being able to generate the SPIR-V shader from the CL kernel.
Please provide the log from clvk. It will contain everything we need to reproduce and fix the issue.
Thanks
@rjodinchr log.txt
Oh, I understand what you meant by:
or do you mean the SPIR-V?
This application is using OpenCL SPIR-V source. This was not clear to me as SPIR-V has 2 variant. The OpenCL one and the Vulkan one. While the goal of clvk is to compile whatever CL source it gets to Vulkan SPIR-V, it can take OpenCL SPIR-V as the input.
But the log does not contain the OpenCL SPIR-V inputs. Could you share the sources then (I see two calls to clCraeteProgramWithIL, so both will be needed).
Hmm only one got dumped for me:
; SPIR-V
; Version: 1.1
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 145
; Schema: 0
OpCapability Addresses
OpCapability Linkage
OpCapability Kernel
OpCapability Int64
OpCapability Int8
%1 = OpExtInstImport "OpenCL.std"
%2 = OpExtInstImport "OpenCL.DebugInfo.100"
OpMemoryModel Physical64 OpenCL
OpEntryPoint Kernel %80 "_Z12gpuMatrixMulPKfS0_Pfjjj" %__spirv_BuiltInWorkgroupSize %__spirv_BuiltInWorkgroupId %__spirv_BuiltInLocalInvocationId
%89 = OpString "/space/pvelesko/chipStar/main/samples/0_MatrixMultiply/MatrixMultiply.cpp"
%93 = OpString "gpuMatrixMul"
%94 = OpString ""
%97 = OpString "__hip_get_block_dim_x"
%98 = OpString "/space/pvelesko/chipStar/main/include/hip/spirv_hip.hh"
%101 = OpString "__get_x"
%103 = OpString "__hip_get_block_idx_x"
%106 = OpString "__hip_get_thread_idx_x"
%109 = OpString "__hip_get_block_dim_y"
%111 = OpString "__get_y"
%113 = OpString "__hip_get_block_idx_y"
%116 = OpString "__hip_get_thread_idx_y"
OpSource OpenCL_C 200000
OpName %__spirv_BuiltInWorkgroupSize "__spirv_BuiltInWorkgroupSize"
OpName %__spirv_BuiltInWorkgroupId "__spirv_BuiltInWorkgroupId"
OpName %__spirv_BuiltInLocalInvocationId "__spirv_BuiltInLocalInvocationId"
OpName %A_coerce "A.coerce"
OpName %B_coerce "B.coerce"
OpName %C_coerce "C.coerce"
OpName %M "M"
OpName %N "N"
OpName %K "K"
OpName %entry "entry"
OpName %for_body_lr_ph "for.body.lr.ph"
OpName %for_body "for.body"
OpName %for_cond_cleanup_loopexit "for.cond.cleanup.loopexit"
OpName %for_cond_cleanup "for.cond.cleanup"
OpName %call_i36 "call.i36"
OpName %conv_i "conv.i"
OpName %call_i "call.i"
OpName %conv_i37 "conv.i37"
OpName %mul "mul"
OpName %call_i38 "call.i38"
OpName %conv_i39 "conv.i39"
OpName %add "add"
OpName %call_i40 "call.i40"
OpName %conv_i41 "conv.i41"
OpName %call_i42 "call.i42"
OpName %conv_i43 "conv.i43"
OpName %mul8 "mul8"
OpName %call_i44 "call.i44"
OpName %conv_i45 "conv.i45"
OpName %add10 "add10"
OpName %cmp46_not "cmp46.not"
OpName %mul13 "mul13"
OpName %add18 "add18"
OpName %acc_0_lcssa "acc.0.lcssa"
OpName %mul19 "mul19"
OpName %add20 "add20"
OpName %idxprom21 "idxprom21"
OpName %arrayidx22 "arrayidx22"
OpName %inc "inc"
OpName %k_048 "k.048"
OpName %acc_047 "acc.047"
OpName %mul11 "mul11"
OpName %add12 "add12"
OpName %idxprom "idxprom"
OpName %arrayidx "arrayidx"
OpName %add14 "add14"
OpName %idxprom15 "idxprom15"
OpName %arrayidx16 "arrayidx16"
OpName %mul17 "mul17"
OpName %cmp "cmp"
OpName %A_coerce_0 "A.coerce"
OpName %B_coerce_0 "B.coerce"
OpName %C_coerce_0 "C.coerce"
OpName %M_0 "M"
OpName %N_0 "N"
OpName %K_0 "K"
OpModuleProcessed "Debug info producer: clang version 18.1.5 (https://github.com/CHIP-SPV/llvm-project.git 5c39d7d1aa6e54a9c8df41002d419c398ec8830c)"
OpDecorate %__spirv_BuiltInWorkgroupSize LinkageAttributes "__spirv_BuiltInWorkgroupSize" Import
OpDecorate %__spirv_BuiltInWorkgroupSize Constant
OpDecorate %__spirv_BuiltInWorkgroupSize BuiltIn WorkgroupSize
OpDecorate %__spirv_BuiltInWorkgroupId LinkageAttributes "__spirv_BuiltInWorkgroupId" Import
OpDecorate %__spirv_BuiltInWorkgroupId Constant
OpDecorate %__spirv_BuiltInWorkgroupId BuiltIn WorkgroupId
OpDecorate %__spirv_BuiltInLocalInvocationId LinkageAttributes "__spirv_BuiltInLocalInvocationId" Import
OpDecorate %__spirv_BuiltInLocalInvocationId Constant
OpDecorate %__spirv_BuiltInLocalInvocationId BuiltIn LocalInvocationId
OpDecorate %A_coerce FuncParamAttr NoAlias
OpDecorate %A_coerce FuncParamAttr NoCapture
OpDecorate %A_coerce FuncParamAttr NoWrite
OpDecorate %B_coerce FuncParamAttr NoAlias
OpDecorate %B_coerce FuncParamAttr NoCapture
OpDecorate %B_coerce FuncParamAttr NoWrite
OpDecorate %C_coerce FuncParamAttr NoAlias
OpDecorate %C_coerce FuncParamAttr NoCapture
OpDecorate %A_coerce_0 FuncParamAttr NoAlias
OpDecorate %A_coerce_0 FuncParamAttr NoCapture
OpDecorate %A_coerce_0 FuncParamAttr NoWrite
OpDecorate %B_coerce_0 FuncParamAttr NoAlias
OpDecorate %B_coerce_0 FuncParamAttr NoCapture
OpDecorate %B_coerce_0 FuncParamAttr NoWrite
OpDecorate %C_coerce_0 FuncParamAttr NoAlias
OpDecorate %C_coerce_0 FuncParamAttr NoCapture
%uchar = OpTypeInt 8 0
%ulong = OpTypeInt 64 0
%uint = OpTypeInt 32 0
%uchar_1 = OpConstant %uchar 1
%uint_0 = OpConstant %uint 0
%uint_1 = OpConstant %uint 1
%_ptr_CrossWorkgroup_uchar = OpTypePointer CrossWorkgroup %uchar
%v3ulong = OpTypeVector %ulong 3
%_ptr_Input_v3ulong = OpTypePointer Input %v3ulong
%void = OpTypeVoid
%float = OpTypeFloat 32
%_ptr_CrossWorkgroup_float = OpTypePointer CrossWorkgroup %float
%17 = OpTypeFunction %void %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %uint %uint %uint
%bool = OpTypeBool
%__spirv_BuiltInWorkgroupSize = OpVariable %_ptr_Input_v3ulong Input
%__spirv_BuiltInWorkgroupId = OpVariable %_ptr_Input_v3ulong Input
%__spirv_BuiltInLocalInvocationId = OpVariable %_ptr_Input_v3ulong Input
%float_0 = OpConstant %float 0
%90 = OpExtInst %void %2 DebugSource %89
%91 = OpExtInst %void %2 DebugCompilationUnit 65536 4 %90 CPP_for_OpenCL
%92 = OpExtInst %void %2 DebugTypeFunction None %void
%95 = OpExtInst %void %2 DebugInfoNone
%96 = OpExtInst %void %2 DebugFunction %93 %92 %90 83 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 87 %18 %95
%99 = OpExtInst %void %2 DebugSource %98
%100 = OpExtInst %void %2 DebugFunction %97 %92 %99 82 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 82 %95 %95
%102 = OpExtInst %void %2 DebugFunction %101 %92 %99 108 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 108 %95 %95
%104 = OpExtInst %void %2 DebugFunction %103 %92 %99 77 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 77 %95 %95
%105 = OpExtInst %void %2 DebugFunction %101 %92 %99 102 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 102 %95 %95
%107 = OpExtInst %void %2 DebugFunction %106 %92 %99 72 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 72 %95 %95
%108 = OpExtInst %void %2 DebugFunction %101 %92 %99 96 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 96 %95 %95
%110 = OpExtInst %void %2 DebugFunction %109 %92 %99 83 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 83 %95 %95
%112 = OpExtInst %void %2 DebugFunction %111 %92 %99 109 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 109 %95 %95
%114 = OpExtInst %void %2 DebugFunction %113 %92 %99 78 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 78 %95 %95
%115 = OpExtInst %void %2 DebugFunction %111 %92 %99 103 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 103 %95 %95
%117 = OpExtInst %void %2 DebugFunction %116 %92 %99 73 0 %91 %94 FlagIsLocal|FlagIsDefinition|FlagPrototyped|FlagIsOptimized 73 %95 %95
%118 = OpExtInst %void %2 DebugFunction %111 %92 %99 97 0 %91 %94 FlagIsDefinition|FlagPrototyped|FlagIsOptimized 97 %95 %95
%119 = OpExtInst %void %2 DebugInlinedAt 90 %96
%120 = OpExtInst %void %2 DebugInlinedAt 108 %102 %119
%122 = OpExtInst %void %2 DebugInlinedAt 90 %96
%123 = OpExtInst %void %2 DebugInlinedAt 102 %105 %122
%126 = OpExtInst %void %2 DebugInlinedAt 90 %96
%127 = OpExtInst %void %2 DebugInlinedAt 96 %108 %126
%130 = OpExtInst %void %2 DebugInlinedAt 92 %96
%131 = OpExtInst %void %2 DebugInlinedAt 109 %112 %130
%133 = OpExtInst %void %2 DebugInlinedAt 92 %96
%134 = OpExtInst %void %2 DebugInlinedAt 103 %115 %133
%137 = OpExtInst %void %2 DebugInlinedAt 92 %96
%138 = OpExtInst %void %2 DebugInlinedAt 97 %118 %137
%18 = OpFunction %void None %17
%A_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
%B_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
%C_coerce = OpFunctionParameter %_ptr_CrossWorkgroup_float
%M = OpFunctionParameter %uint
%N = OpFunctionParameter %uint
%K = OpFunctionParameter %uint
%entry = OpLabel
%121 = OpExtInst %void %2 DebugScope %100 %120
OpLine %98 82 50
%30 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupSize Aligned 32
%call_i36 = OpCompositeExtract %ulong %30 0
%conv_i = OpUConvert %uint %call_i36
%124 = OpExtInst %void %2 DebugScope %104 %123
OpLine %98 77 50
%33 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32
%call_i = OpCompositeExtract %ulong %33 0
%conv_i37 = OpUConvert %uint %call_i
%125 = OpExtInst %void %2 DebugScope %96
OpLine %89 90 21
%mul = OpIMul %uint %conv_i37 %conv_i
%128 = OpExtInst %void %2 DebugScope %107 %127
OpLine %98 72 51
%37 = OpLoad %v3ulong %__spirv_BuiltInLocalInvocationId Aligned 32
%call_i38 = OpCompositeExtract %ulong %37 0
%conv_i39 = OpUConvert %uint %call_i38
%129 = OpExtInst %void %2 DebugScope %96
OpLine %89 90 37
%add = OpIAdd %uint %mul %conv_i39
%132 = OpExtInst %void %2 DebugScope %110 %131
OpLine %98 83 50
%41 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupSize Aligned 32
%call_i40 = OpCompositeExtract %ulong %41 1
%conv_i41 = OpUConvert %uint %call_i40
%135 = OpExtInst %void %2 DebugScope %114 %134
OpLine %98 78 50
%44 = OpLoad %v3ulong %__spirv_BuiltInWorkgroupId Aligned 32
%call_i42 = OpCompositeExtract %ulong %44 1
%conv_i43 = OpUConvert %uint %call_i42
%136 = OpExtInst %void %2 DebugScope %96
OpLine %89 92 21
%mul8 = OpIMul %uint %conv_i43 %conv_i41
%139 = OpExtInst %void %2 DebugScope %117 %138
OpLine %98 73 51
%48 = OpLoad %v3ulong %__spirv_BuiltInLocalInvocationId Aligned 32
%call_i44 = OpCompositeExtract %ulong %48 1
%conv_i45 = OpUConvert %uint %call_i44
%140 = OpExtInst %void %2 DebugScope %96
OpLine %89 92 37
%add10 = OpIAdd %uint %mul8 %conv_i45
OpLine %89 96 22
%cmp46_not = OpIEqual %bool %K %uint_0
OpLine %89 96 3
OpBranchConditional %cmp46_not %for_cond_cleanup %for_body_lr_ph
%for_body_lr_ph = OpLabel
%mul13 = OpIMul %uint %add10 %K
%141 = OpExtInst %void %2 DebugScope %96
OpLine %89 96 3
OpBranch %for_body
%for_body = OpLabel
%k_048 = OpPhi %uint %uint_0 %for_body_lr_ph %inc %for_body
%acc_047 = OpPhi %float %float_0 %for_body_lr_ph %add18 %for_body
%144 = OpExtInst %void %2 DebugScope %96
OpLine %89 100 16
%mul11 = OpIMul %uint %k_048 %M
OpLine %89 100 20
%add12 = OpIAdd %uint %mul11 %add
OpLine %89 100 12
%idxprom = OpUConvert %ulong %add12
%arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %A_coerce %idxprom
%70 = OpLoad %float %arrayidx Aligned 4
OpLine %89 100 51
%add14 = OpIAdd %uint %k_048 %mul13
OpLine %89 100 35
%idxprom15 = OpUConvert %ulong %add14
%arrayidx16 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %B_coerce %idxprom15
%74 = OpLoad %float %arrayidx16 Aligned 4
OpLine %89 100 33
%mul17 = OpFMul %float %70 %74
OpLine %89 100 9
%add18 = OpFAdd %float %acc_047 %mul17
OpLine %89 96 28
%inc = OpIAdd %uint %k_048 %uint_1
OpLine %89 96 22
%cmp = OpULessThan %bool %inc %K
OpLine %89 96 3
OpBranchConditional %cmp %for_body %for_cond_cleanup_loopexit
%for_cond_cleanup_loopexit = OpLabel
%142 = OpExtInst %void %2 DebugScope %96
OpLine %89 105 15
OpBranch %for_cond_cleanup
%for_cond_cleanup = OpLabel
%143 = OpExtInst %void %2 DebugScope %96
OpLine %89 0 0
%acc_0_lcssa = OpPhi %float %float_0 %entry %add18 %for_cond_cleanup_loopexit
OpLine %89 105 15
%mul19 = OpIMul %uint %add10 %M
OpLine %89 105 19
%add20 = OpIAdd %uint %mul19 %add
OpLine %89 105 3
%idxprom21 = OpUConvert %ulong %add20
%arrayidx22 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %C_coerce %idxprom21
OpLine %89 105 32
OpStore %arrayidx22 %acc_0_lcssa Aligned 4
OpLine %89 106 1
OpReturn
OpFunctionEnd
%80 = OpFunction %void None %17
%A_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
%B_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
%C_coerce_0 = OpFunctionParameter %_ptr_CrossWorkgroup_float
%M_0 = OpFunctionParameter %uint
%N_0 = OpFunctionParameter %uint
%K_0 = OpFunctionParameter %uint
%87 = OpLabel
%88 = OpFunctionCall %void %18 %A_coerce_0 %B_coerce_0 %C_coerce_0 %M_0 %N_0 %K_0
OpReturn
OpFunctionEnd
I'll try to reproduce with that one then. Thank you
I'm not able to reproduce. I will need even more assets I think. Could you run with:
CLVK_LOG=4
CLVK_LOG_DEST=file:log.txt
CLVK_KEEP_TEMPORARIES=1
That should produce a log.txt file as well as a bunch of clvk-XXXXX folders.
Could you make a archive with all of them and upload that here please?
╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable›
╰─$ ./tools/opencl-spirv-compiler/opencl-spirv-compiler ./MatrixMul.spv 130 ↵
Build failed. Log:
warning: overriding the module target triple with spir-unknown-unknown
warning: Linking two modules of different data layouts: 'Unknown buffer' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1' whereas 'clvk-9YdtEh/source.bc' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-G1'
clspv: /space/pvelesko/clvk/external/clspv/third_party/llvm/llvm/lib/IR/Value.cpp:507: void llvm::Value::doRAUW(llvm::Value*, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Aborted (core dumped)
Error processing ./MatrixMul.spv: Failed to build program
So the issue is that clvk does not realize that the SPIR-V source is using OpMemoryModel Physical64 OpenCL, thus it does not give the proper arguments to clspv.
We can force clvk anyway by running it with CLVK_SPIR_ARCH=spir64, but then I got the following error:
error: 91: [VUID-StandaloneSpirv-OpVariable-04651] OpVariable, <id> '4[%4]', has a disallowed initializer & storage class combination.
From Vulkan spec:
Variable declarations that include initializers must have one of the following storage classes: Output, Private, Function or Workgroup
%4 = OpVariable %_ptr_PhysicalStorageBuffer_uchar PhysicalStorageBuffer %uchar_1
This is because we have the following line in the OpenCL SPIR-V kernel:
%__chip_module_has_no_IGBAs = OpVariable %_ptr_CrossWorkgroup_uchar CrossWorkgroup %uchar_1
Which gets translated into the following LLVM IR (by llvm-spirv):
@__chip_module_has_no_IGBAs = addrspace(1) constant i8 1
Note that it is on addrspace(1) which is the global addrspace, not the constant one (2).
I need to check, but I think clspv does not support such global variables.
Removing every trace of %__chip_module_has_no_IGBAs from MatrixMul.spv.txt, assembling back to MatrixMul.spv and then running the program succeeded locally:
$ CLVK_SPIRV_ARCH=spir64 ./build/opencl-spirv-compiler ./MatrixMul.spv
Successfully compiled ./MatrixMul.spv to MatrixMul_device.bin
__chip_module_has_no_IGBAs is used for optimization and we can generate SPIR-V without it but generally speaking we do need it in global space since we r/w to it. Is the main issue here that it has an initializer or that that it's in global space?
Disabled it, and I was able to run a HIP example on Vulkan!
╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●›
╰─$ CLVK_SPIRV_ARCH=spir64 CHIP_LOGLEVEL=info ./samples/0_MatrixMultiply/MatrixMultiply
CHIP info [TID 418282] [1733410211.090988490] : CHIP_PLATFORM=0
CHIP info [TID 418282] [1733410211.091204631] : CHIP_DEVICE_TYPE=gpu
CHIP info [TID 418282] [1733410211.091225923] : CHIP_DEVICE=0
CHIP info [TID 418282] [1733410211.091261683] : CHIP_BE=opencl
CHIP info [TID 418282] [1733410211.091282034] : CHIP_DUMP_SPIRV=off
CHIP info [TID 418282] [1733410211.091313796] : CHIP_JIT_FLAGS_OVERRIDE=
CHIP info [TID 418282] [1733410211.091328992] : CHIP_L0_COLLECT_EVENTS_TIMEOUT=0
CHIP info [TID 418282] [1733410211.091367973] : CHIP_L0_EVENT_TIMEOUT=0
CHIP info [TID 418282] [1733410211.091383225] : CHIP_SKIP_UNINIT=off
CHIP info [TID 418282] [1733410211.091399049] : CHIP_LAZY_JIT=on
CHIP info [TID 418282] [1733410211.091413432] : CHIP_OCL_DISABLE_QUEUE_PROFILING=off
CHIP info [TID 418282] [1733410211.091429032] : CHIP_OCL_USE_ALLOC_STRATEGY=off
CHIP info [TID 418282] [1733410211.091454056] : CHIP_MODULE_CACHE_DIR=/space/pvelesko/.cache/chipStar
CHIP info [TID 418282] [1733410211.237711581] : OpenCL Devices of type gpu with SPIR-V_1 support:
Intel(R) Graphics (RPL-S) is supported.
Device name Intel(R) Graphics (RPL-S)
CHIP info [TID 418282] [1733410211.483975250] : clProgramBuild took 0.00377717 seconds
CHIP info [TID 418282] [1733410211.484163837] : Loaded from cache, kernel compilation took 0.00411241 seconds
CHIP info [TID 418282] [1733410211.484184029] : Module compilation took 4133 microseconds
Running 1 iterations
hipLaunchKernel 0 time taken: 122.767
hipLaunchKernel BEST TIME: 122.767
GPU real time taken(ms): 126.033
matrixMultiplyCPUReference time taken(ms): 4175.84
Verification PASSED!
╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●›
╰─$ ctest -R cuda --output-on-failure 8 ↵
Test project /space/pvelesko/chipStar/clvk-enable/build
Start 1285: cucc-include-cuda-runtime-twice
1/28 Test #1285: cucc-include-cuda-runtime-twice ... Passed 0.19 sec
Start 1386: cuda-asyncAPI
2/28 Test #1386: cuda-asyncAPI ..................... Passed 0.73 sec
Start 1387: cuda-lambda
3/28 Test #1387: cuda-lambda .......................Subprocess aborted***Exception: 0.50 sec
CHIP error [TID 603321] [1733411241.287751572] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 603321] [1733411241.287990128] : Caught Error: hipErrorNotInitialized
cuda-lambda: /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/0_Simple/lambda/lambda.cu:23: int main(): Assertion `OutH == 1' failed.
Start 1388: cuda-matrixMul
4/28 Test #1388: cuda-matrixMul .................... Passed 0.69 sec
Start 1389: cuda-template
5/28 Test #1389: cuda-template ..................... Passed 0.57 sec
Start 1390: cuda-vectorAdd
6/28 Test #1390: cuda-vectorAdd .................... Passed 0.57 sec
Start 1391: cuda-clock
7/28 Test #1391: cuda-clock ........................ Passed 0.91 sec
Start 1392: cuda-cppIntegration
8/28 Test #1392: cuda-cppIntegration ............... Passed 0.57 sec
Start 1393: cuda-simplePrintf
9/28 Test #1393: cuda-simplePrintf ................. Passed 0.92 sec
Start 1394: cuda-simpleAtomicIntrinsics
10/28 Test #1394: cuda-simpleAtomicIntrinsics .......***Failed 0.97 sec
CHIP error [TID 603918] [1733411246.073723476] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 603918] [1733411246.073966304] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/0_Simple/simpleAtomicIntrinsics/simpleAtomicIntrinsics.cu(126) : getLastCudaError() CUDA error : Kernel execution failed : (3) hipErrorNotInitialized.
simpleAtomicIntrinsics starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
> GPU device has 1 Multi-Processors, SM 2.0 compute capabilities
Start 1395: cuda-simpleTemplates
11/28 Test #1395: cuda-simpleTemplates .............. Passed 0.57 sec
Start 1396: cuda-simpleCallback
12/28 Test #1396: cuda-simpleCallback ............... Passed 0.58 sec
Start 1397: cuda-bandwidthTest
13/28 Test #1397: cuda-bandwidthTest ................ Passed 1.86 sec
Start 1398: cuda-deviceQuery
14/28 Test #1398: cuda-deviceQuery .................. Passed 0.58 sec
Start 1399: cuda-convolutionSeparable
15/28 Test #1399: cuda-convolutionSeparable .........***Failed 3.77 sec
CHIP error [TID 604365] [1733411251.938588085] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 604365] [1733411251.938794831] : Caught Error: hipErrorNotInitialized
CHIP error [TID 604365] [1733411253.437853597] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 604365] [1733411253.437948768] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/2_Graphics/convolutionSeparable/convolutionSeparable.cu(120) : getLastCudaError() CUDA error : convolutionRowsKernel() execution failed
: (3) hipErrorNotInitialized.
[/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-convolutionSeparable] - Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
Start 1400: cuda-dwtHaar1D
16/28 Test #1400: cuda-dwtHaar1D .................... Passed 0.58 sec
Start 1401: cuda-histogram
17/28 Test #1401: cuda-histogram .................... Passed 2.28 sec
Start 1402: cuda-binomialoptions
18/28 Test #1402: cuda-binomialoptions ..............***Failed 1.46 sec
CHIP error [TID 604670] [1733411257.766337544] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 604670] [1733411257.766585260] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/4_Finance/binomialOptions/binomialOptions_kernel.cu:144 code=3(hipErrorNotInitialized) "cudaMemcpyToSymbol(d_OptionData, h_OptionData, optN * sizeof(__TOptionData))"
[/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-binomialoptions] - Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Generating input data...
Running GPU binomial tree...
Start 1403: cuda-blackscholes
19/28 Test #1403: cuda-blackscholes ................. Passed 1.15 sec
Start 1404: cuda-qrng
20/28 Test #1404: cuda-qrng .........................***Failed 1.31 sec
CHIP error [TID 604858] [1733411260.220601360] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 604858] [1733411260.220850035] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/4_Finance/quasirandomGenerator/quasirandomGenerator_kernel.cu:67 code=3(hipErrorNotInitialized) "cudaMemcpyToSymbol( c_Table, tableCPU, QRNG_DIMENSIONS * QRNG_RESOLUTION * sizeof(unsigned int) )"
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-qrng Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Allocating GPU memory...
Allocating CPU memory...
Initializing QRNG tables...
Start 1405: cuda-mergesort
21/28 Test #1405: cuda-mergesort ....................***Failed 2.32 sec
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-mergesort Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Initializing GPU merge sort...
Running GPU merge sort...
error: 207: [VUID-StandaloneSpirv-OpVariable-04734] OpVariable, <id> '16[%16]', initializers are limited to OpConstantNull in Workgroup storage class
%16 = OpVariable %_ptr_Workgroup_uint Workgroup %uint_0
CHIP error [TID 604945] [1733411262.543079372] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 604945] [1733411262.543309720] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/mergeSort/mergeSort.cu(179) : getLastCudaError() CUDA error : mergeSortShared<1><<<>>> failed
: (3) hipErrorNotInitialized.
Start 1406: cuda-scalarprod
22/28 Test #1406: cuda-scalarprod ................... Passed 0.62 sec
Start 1407: cuda-scan
23/28 Test #1407: cuda-scan .........................***Failed 1.31 sec
CHIP error [TID 605142] [1733411264.471347575] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 605142] [1733411264.471579465] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/scan/scan.cu(232) : getLastCudaError() CUDA error : scanExclusiveShared() execution FAILED
: (3) hipErrorNotInitialized.
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-scan Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Initializing CUDA-C scan...
*** Running GPU scan for short arrays (100 identical iterations)...
Running scan for 4 elements (1703936 arrays)...
Start 1408: cuda-sortnet
24/28 Test #1408: cuda-sortnet ...................... Passed 3.68 sec
Start 1409: cuda-FDTD3d
25/28 Test #1409: cuda-FDTD3d .......................***Failed 9.41 sec
CHIP error [TID 605330] [1733411277.538024085] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 605330] [1733411277.538260836] : Caught Error: hipErrorNotInitialized
CUDA error at /space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/FDTD3d/FDTD3dGPU.cu:113 code=3(hipErrorNotInitialized) "cudaFuncGetAttributes(&funcAttrib, FiniteDifferencesKernel)"
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-FDTD3d Starting...
Set-up, based upon target device GMEM size...
getTargetDeviceGlobalMemSize
cudaGetDeviceCount
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
cudaGetDeviceProperties
generateRandomData
FDTD on 376 x 376 x 376 volume with symmetric filter radius 4 for 5 timesteps...
fdtdReference...
calloc intermediate
Host FDTD loop
t = 0
t = 1
t = 2
t = 3
t = 4
fdtdReference complete
fdtdGPU...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Start 1410: cuda-sobolqrng
26/28 Test #1410: cuda-sobolqrng .................... Passed 0.69 sec
Start 1411: cuda-reduction
27/28 Test #1411: cuda-reduction ....................***Failed 24.47 sec
CHIP error [TID 605528] [1733411291.095725620] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 605528] [1733411291.095991100] : Caught Error: hipErrorNotInitialized
CHIP error [TID 605528] [1733411302.720568772] : hipErrorNotInitialized (Device library link step failed.) in /space/pvelesko/chipStar/clvk-enable/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile
CHIP error [TID 605528] [1733411302.720723338] : Caught Error: hipErrorNotInitialized
/space/pvelesko/chipStar/clvk-enable/samples/cuda_samples/6_Advanced/reduction/reduction.cpp(294) : getLastCudaError() CUDA error : Kernel execution failed : (3) hipErrorNotInitialized.
/space/pvelesko/chipStar/clvk-enable/build/samples/cuda_samples/cuda-reduction Starting...
MapSMtoCores for SM 2.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 2.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 2.0
Using Device 0: Intel(R) Graphics (RPL-S)
Reducing array of type int
16777216 elements
256 threads (max)
64 blocks
Start 1412: cuda-fastwalsh
28/28 Test #1412: cuda-fastwalsh .................... Passed 3.61 sec
68% tests passed, 9 tests failed out of 28
Total Test time (real) = 66.93 sec
The following tests FAILED:
1387 - cuda-lambda (Subprocess aborted)
1394 - cuda-simpleAtomicIntrinsics (Failed)
1399 - cuda-convolutionSeparable (Failed)
1402 - cuda-binomialoptions (Failed)
1404 - cuda-qrng (Failed)
1405 - cuda-mergesort (Failed)
1407 - cuda-scan (Failed)
1409 - cuda-FDTD3d (Failed)
1411 - cuda-reduction (Failed)
Errors while running CTest
╭─pvelesko@cupcake ~/chipStar/clvk-enable/build ‹clvk-enable●›
@pvelesko The extension specification has now been finalised and integrated into the main OpenCL spec and we also have CTS tests. Are you planning to pick up this PR at some point in the future? It's fine if you're not but it would be helpful to know so someone else could pick it up (though I'm not aware of anyone dying to do it at this time :)).
I will