KataGo openclUseFP16 testing not works well on AMD-Integrated GPU （Using OpenCL Device 0: gfx90c (Advanced Micro Devices, Inc.) ）

fixedtune.log fixedtune11_gpugfx90c_x19_y19_c384_mv14.txt tune.log tune11_gpugfx90c_x19_y19_c384_mv14.txt 说明.txt

Recently, after updating my drivers and KataGo, the OpenCL version of KataGo stopped working properly on my computer. After numerous attempts, I was able to pinpoint the issue. The tuning log is attached: The key information is that FP16 tensor failed. FP16 storage is enabled. FP16 compute is enabled.

Tuning hGemmWmma for convolutions
Testing 144 different configs
FP16 tensor core tuning failed, assuming no FP16 tensor core support
------------------------------------------------------
Tuning hGemmWmmaNCHW for 1x1 convolutions
Testing 108 different configs
FP16 tensor core tuning failed for 1x1 convs
------------------------------------------------------
Tuning xGemm16 for convolutions
Testing 69 different configs
Tuning 0/69 (reference) Calls/sec 9.56218 ErrorProp 0.0026977 MWG=8 NWG=8 KWG=8 MDIMC=1 NDIMC=1 MDIMA=1 NDIMB=1 KWI=1 VWM=1 VWN=1 STRM=0 STRN=0 SA=0 SB=0
Tuning 1/69 Calls/sec 1048.63 ErrorProp 0.0026977 MWG=64 NWG=64 KWG=16 MDIMC=8 NDIMC=8 MDIMA=8 NDIMB=8 KWI=2 VWM=4 VWN=4 STRM=0 STRN=0 SA=1 SB=1
Tuning 7/69 Calls/sec 1447.14 ErrorProp 0.0026977 MWG=64 NWG=64 KWG=32 MDIMC=16 NDIMC=8 MDIMA=16 NDIMB=8 KWI=2 VWM=4 VWN=4 STRM=0 STRN=0 SA=1 SB=1
Tuning 20/69 ...
Tuning 27/69 Calls/sec 1460.09 ErrorProp 0.0026977 MWG=64 NWG=64 KWG=32 MDIMC=8 NDIMC=16 MDIMA=8 NDIMB=16 KWI=2 VWM=4 VWN=4 STRM=0 STRN=0 SA=1 SB=1
Tuning 40/69 ...
Tuning 60/69 ...
Enabling FP16 compute due to better performance
------------------------------------------------------
Using FP16 storage!
Using FP16 compute!

The final generated tuning configuration:

#canUseFP16Storage
1
#canUseFP16Compute
1
#canUseFP16TensorCores
0
#canUseFP16TensorCoresFor1x1
0
#shouldUseFP16Storage
1
#shouldUseFP16Compute
1
#shouldUseFP16TensorCores
0
#shouldUseFP16TensorCoresFor1x1
0

After a normal test, KataGo would display "Got nonfinite for policy sum". After endless troubleshooting, I found the following description in the default_gtp.cfg:

# KataGo will automatically use FP16 or not based on testing your GPU during
# tuning. If you want to try to force a particular behavior though you can
# uncomment this option and change it to "true" or "false". This is a fairly
# blunt setting - more detailed settings are testable by rerunning the tuner
# with various arguments (./katago tuner).
# openclUseFP16 = auto

I forced it to change to: openclUseFP16 = false After re-tuning, KataGo returned to normal. In the attachment, I have also included the tuning log and tuning configuration results after forcibly disabling openclUseFP16.

Therefore, I suspect that the FP16 testing was not thorough enough, leading to the erroneous enabling of FP16 storage and FP16 compute.

Feb 19 '24 03:02 jojobm

Thanks for the report. It's hard to say what is going wrong or how to fix it, because the log you attached looks like the tuner is behaving exactly correctly based on its testing.

hGemmWmma failed therefore it correctly set shouldUseFP16TensorCores and shouldUseFP16TensorCoresFor1x1 to 0.
xGemm16 succeeded therefore it correctly set shouldUseFP16Storage and shouldUseFP16Compute to 1.

So currently it's a mystery why when you tried to run with FP16 it didn't work.

Feb 21 '24 14:02 lightvector

Yes,fix-debuging is not easy (it's related to hardware and drivers). I just post this for info. As all of my go-app suffer from it(just-go , baduk which auto install process break off due to the "nonfinite for policy sum" problem. Maybe the describe of default_gtp.cfg can list some of these infomation of it.

Feb 22 '24 02:02 jojobm

Device： AMD Ryzen 74800 U with Radeon Graphics The same problem， set openclUseFP16 = false make my contribute working ！

Aug 29 '24 02:08 tinyhare

KataGo KataGo copied to clipboard

openclUseFP16 testing not works well on AMD-Integrated GPU （Using OpenCL Device 0: gfx90c (Advanced Micro Devices, Inc.) ）

KataGo
KataGo copied to clipboard