KataGo
                                
                                 KataGo copied to clipboard
                                
                                    KataGo copied to clipboard
                            
                            
                            
                        Weird OpenCL error ....
Hello, sorry to bother you. Not sure if that is even Katago's problem or something else. Have been using kata under Linux for years using OpenCL with an AMD GPU (RX570). Recently I switched to a RX6770XT, and after some troubles managed to install the AMD drivers for it for OpenCL support. However, some strange things happen that didn't happen before.
Here's the output Kata gives when I try to tune it:
`~/katago$ ./katago tuner -model kata1-b40.bin.gz 2023-06-05 23:40:45+0200: Loading model... 2023-06-05 23:40:46+0200: Querying system devices... 2023-06-05 23:40:46+0200: Found OpenCL Platform 0: Clover (Mesa) (OpenCL 1.1 Mesa 22.2.5) 2023-06-05 23:40:46+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-06-05 23:40:46+0200: Found OpenCL Platform 1: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3513.0)) 2023-06-05 23:40:46+0200: Found 0 device(s) on platform 1 with type CPU or GPU or Accelerator, skipping 2023-06-05 23:40:46+0200: Found OpenCL Device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) (AMD) (score 11000101) 2023-06-05 23:40:46+0200: Tuner starting... 2023-06-05 23:40:46+0200: Creating context for OpenCL Platform: Clover (Mesa) (OpenCL 1.1 Mesa 22.2.5) 2023-06-05 23:40:46+0200: Using OpenCL Device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) (AMD) OpenCL 1.1 Mesa 22.2.5 (Extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning)
Tuning device 0: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) Starting from existing parameters in: /home/werner/.katago/opencltuning/tune8_gpuAMDRadeonRX6700XTnavi22LLVM1506DRM348519043generic_x19_y19_c256_mv10.txt Beginning GPU tuning for AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) modelVersion 10 channels 256 Setting winograd3x3TileSize = 4
Tuning xGemmDirect for 1x1 convolutions and matrix mult Testing 56 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/56 ... Tuning 40/56 ... ERROR: Could not find any configuration that worked
Tuning xGemm for convolutions Testing 70 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/70 ... Tuning 40/70 ... Tuning 60/70 ... ERROR: Could not find any configuration that worked
Tuning hGemmWmma for convolutions Testing 146 different configs FP16 tensor core tuning failed, assuming no FP16 tensor core support
Tuning xGemm for convolutions - trying with FP16 storage Testing 70 different configs FP16 storage tuning failed, assuming no FP16 storage support
Using FP32 storage! Using FP32 compute!
Tuning winograd transform for convolutions Testing 47 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/47 ... Tuning 40/47 ... ERROR: Could not find any configuration that worked
Tuning winograd untransform for convolutions Testing 111 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/111 ... Tuning 40/111 ... Tuning 60/111 ... Tuning 80/111 ... Tuning 100/111 ... ERROR: Could not find any configuration that worked
Tuning global pooling strides Testing 106 different configs WARNING: Reference implementation failed: CL_BUILD_PROGRAM_FAILURE Tuning 20/106 ... Tuning 40/106 ... Tuning 60/106 ... Tuning 80/106 ... Tuning 100/106 ... ERROR: Could not find any configuration that worked Done tuning
Done, results saved to /home/werner/.katago/opencltuning/tune8_gpuAMDRadeonRX6700XTnavi22LLVM1506DRM348519043generic_x19_y19_c256_mv10.txt `
Never seen that error before. But I suspect it has something to do with a line that clinfo is giving me (right under "CL_PROGRAM_BUILD_LOG"):
`$ clinfo Number of platforms 2 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 22.2.5 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA
Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.1 AMD-APP (3513.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Host timer resolution 1ns
Platform Name                                   Clover
Number of devices                                 1
Device Name                                     AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic)
Device Vendor                                   AMD
Device Vendor ID                                0x1002
Device Version                                  OpenCL 1.1 Mesa 22.2.5
Device Numeric Version                          0x401000 (1.1.0)
Driver Version                                  22.2.5
Device OpenCL C Version                         OpenCL C 1.1
Device Type                                     GPU
Device Profile                                  FULL_PROFILE
Device Available                                Yes
Compiler Available                              Yes
Max compute units                               40
Max clock frequency                             2725MHz
Max work item dimensions                        3
Max work item sizes                             256x256x256
Max work group size                             256
=== CL_PROGRAM_BUILD_LOG ===
fatal error: cannot open file '/usr/lib/clc/gfx1031-amdgcn-mesa-mesa3d.bc': No such file or directory
Preferred work group size multiple (kernel)     <getWGsizes:1504: create kernel : error -46>
Preferred / native vector sizes
char                                                16 / 16
short                                                8 / 8
int                                                  4 / 4
long                                                 2 / 2
half                                                 0 / 0        (n/a)
float                                                4 / 4
double                                               2 / 2        (cl_khr_fp64)
Half-precision Floating-point support           (n/a)
Single-precision Floating-point support         (core)
Denormals                                     No
Infinity and NANs                             Yes
Round to nearest                              Yes
Round to zero                                 No
Round to infinity                             No
IEEE754-2008 fused multiply-add               No
Support is emulated in software               No
Correctly-rounded divide and sqrt operations  No
Double-precision Floating-point support         (cl_khr_fp64)
Denormals                                     Yes
Infinity and NANs                             Yes
Round to nearest                              Yes
Round to zero                                 Yes
Round to infinity                             Yes
IEEE754-2008 fused multiply-add               Yes
Support is emulated in software               No
Address bits                                    64, Little-Endian
Global memory size                              12884901888 (12GiB)
Error Correction support                        No
Max memory allocation                           3221225472 (3GiB)
Unified memory for Host and Device              No
Minimum alignment for any data type             128 bytes
Alignment of base address                       32768 bits (4096 bytes)
Global Memory cache type                        None
Image support                                   No
Local memory type                               Local
Local memory size                               65536 (64KiB)
Max number of constant args                     16
Max constant buffer size                        67108864 (64MiB)
Max size of kernel argument                     1024
Queue properties
Out-of-order execution                        No
Profiling                                     Yes
Profiling timer resolution                      0ns
Execution capabilities
Run OpenCL kernels                            Yes
Run native kernels                            No
ILs with version                              (n/a)
Built-in kernels with version                   (n/a)
Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning
Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
cl_khr_fp64                                                      0x400000 (1.0.0)
cl_khr_extended_versioning                                       0x400000 (1.0.0)
Platform Name AMD Accelerated Parallel Processing Number of devices 0
NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.48, 5.19.0-43-generic)`
Maybe you have a clue.
Yeah, you might be running into a problem with drivers. I see the word "Mesa" appear in your info - that "Mesa" drivers have been found to be buggy for general purpose OpenCL usage by users in the past. Does this thread help? https://bbs.archlinux.org/viewtopic.php?pid=1895516#p1895516
Thanks. Seems like it. The only reason I tried the mesa-opencl-icd lib is cause an admin (on the mint forum) suggested that. Without it, my clinfo output doesn't even see the GPU as opencl device... but anyway, that's not your problem. Just thought I ask here, maybe the kata output would be a clue. Frankly I'm out of ideas. It used to work pretty well. Not sure what caused the problems, maybe a switch to a newer GPU (but not really, old GPU has same problems), maybe switching to Mint 21 (based on Ubuntu 22.04).... just can't get it to work anymore and even skilled people like the admins and devs on the Mint forum can't help... hm.
Only thing you might know: do you yourself use or know of people using AMD GPUs like the RX 6700 XT, or something from that generation, successfully with Linux/Ubuntu/Mint and Katago? I mean, this has to work somehow somewhere...
use AMD-Rocm instead of mesa which is currently known to be broken