KataGo icon indicating copy to clipboard operation
KataGo copied to clipboard

katago 1.13.0 opencl version error.

Open mc-mong opened this issue 2 years ago • 8 comments

Loaded model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 GTP ready, beginning main protocol loop katago 18 block> komi 6.5

katago 18 block> boardsize 19

katago 18 block> clear_board

○ katago 18 block> play B D3

● katago 18 block> kata-genmove_analyze W 50 Connection Failed

After the above message, the game cannot proceed.

os : windows 10 64bit gpu : 3070ti katago 1.13.0 opencl new version.

mc-mong avatar May 24 '23 05:05 mc-mong

"Connection Failed" is an interesting error message, it's not one I've seen before. Is it produced by KataGo directly, or is there a GUI or other graphical game/SGF editor or that you are using? Does the same error occur on older versions?

lightvector avatar May 24 '23 11:05 lightvector

Overwritten and copied to an existing file. It works when I go back to the new folder. Overwritten copy folders are still not allowed. Delete the existing file and do it in a clean folder.

Maybe it's because of the previously used config file. I made a config file with genconfig and used it.

test gui : sabaki 0.52.2 version. gogui 1.5.1 version.

mc-mong avatar May 24 '23 20:05 mc-mong

What happens if you run it on the command line?

./katago.exe benchmark -config path/to/your/config.cfg -model path/to/your/model.bin.gz

What error do you get?

lightvector avatar May 24 '23 20:05 lightvector

In the existing folder you copied, it appears as follows.


D:\baduk\Katago-opencl>katago benchmark -config b18_config.cfg -model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz 2023-05-25 05:21:55+0900: Running with following config: allowResignation = true friendlyPassOk = false hasButton = false koRule = SIMPLE lagBuffer = 1.0 logAllGTPCommunication = true logDir = gtp_logs logSearchInfo = true logToStderr = false maxPlayouts = 30 multiStoneSuicideLegal = false nnCacheSizePowerOfTwo = 23 nnMutexPoolSizePowerOfTwo = 19 numNNServerThreadsPerModel = 1 numSearchThreads = 16 openclDeviceToUseThread0 = 0 ponderingEnabled = false resignConsecTurns = 3 resignThreshold = -0.99 scoringRule = TERRITORY searchFactorAfterOnePass = 0.50 searchFactorAfterTwoPass = 0.25 searchFactorWhenWinning = 0.40 searchFactorWhenWinningThreshold = 0.95 taxRule = SEKI whiteHandicapBonus = 0

2023-05-25 05:21:55+0900: Loading model and initializing benchmark... 2023-05-25 05:21:55+0900: Testing with default positions for board size: 19 2023-05-25 05:21:55+0900: nnRandSeed0 = 8576874730517683006 2023-05-25 05:21:55+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:21:55+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:21:55+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:21:55+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:21:55+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:21:55+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:21:55+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:21:55+0900: Loaded tuning parameters from: D:\baduk\Katago-opencl/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:21:55+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:21:55+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:21:56+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

D:\baduk\Katago-opencl>


** In a clean folder, it appears as follows.

D:\baduk\katago-opencl-ex>katago benchmark -config b18_config.cfg -model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz 2023-05-25 05:28:31+0900: Running with following config: allowResignation = true friendlyPassOk = false hasButton = false koRule = SIMPLE lagBuffer = 1.0 logAllGTPCommunication = true logDir = gtp_logs logSearchInfo = true logToStderr = false maxPlayouts = 30 multiStoneSuicideLegal = false nnCacheSizePowerOfTwo = 23 nnMutexPoolSizePowerOfTwo = 19 numNNServerThreadsPerModel = 1 numSearchThreads = 16 openclDeviceToUseThread0 = 0 ponderingEnabled = false resignConsecTurns = 3 resignThreshold = -0.99 scoringRule = TERRITORY searchFactorAfterOnePass = 0.50 searchFactorAfterTwoPass = 0.25 searchFactorWhenWinning = 0.40 searchFactorWhenWinningThreshold = 0.95 taxRule = SEKI whiteHandicapBonus = 0

2023-05-25 05:28:31+0900: Loading model and initializing benchmark... 2023-05-25 05:28:31+0900: Testing with default positions for board size: 19 2023-05-25 05:28:31+0900: nnRandSeed0 = 16784257899505801039 2023-05-25 05:28:31+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:28:31+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:28:31+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:31+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:28:31+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:28:31+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:32+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:28:32+0900: Loaded tuning parameters from: D:\baduk\katago-opencl-ex/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:28:32+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

2023-05-25 05:28:33+0900: Loaded config b18_config.cfg 2023-05-25 05:28:33+0900: Loaded model ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz

Testing using 800 visits. If you have a good GPU, you might increase this using "-visits N" to get more accurate results. If you have a weak GPU and this is taking forever, you can decrease it instead to finish the benchmark faster.

You are currently using the OpenCL version of KataGo. If you have a strong GPU capable of FP16 tensor cores (e.g. RTX2080), using the Cuda version of KataGo instead may give a mild performance boost.

Your GTP config is currently set to use numSearchThreads = 16 Automatically trying different numbers of threads to home in on the best (board size 19x19):

2023-05-25 05:28:33+0900: GPU 0 finishing, processed 5 rows 5 batches 2023-05-25 05:28:33+0900: nnRandSeed0 = 12646512682837780868 2023-05-25 05:28:33+0900: After dedups: nnModelFile0 = ../katago_weights/b18c384nbt-optimisticv13-s5971M.bin.gz useFP16 auto useNHWC auto 2023-05-25 05:28:33+0900: Initializing neural net buffer to be size 19 * 19 exactly 2023-05-25 05:28:33+0900: Found OpenCL Platform 0: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:33+0900: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator 2023-05-25 05:28:33+0900: Found OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) (score 11000300) 2023-05-25 05:28:33+0900: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 11.8.88) 2023-05-25 05:28:34+0900: Using OpenCL Device 0: NVIDIA GeForce RTX 3070 Ti (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32) 2023-05-25 05:28:34+0900: Loaded tuning parameters from: D:\baduk\katago-opencl-ex/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX3070Ti_x19_y19_c384_mv13.txt 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 Model version 13 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976 2023-05-25 05:28:34+0900: OpenCL backend thread 0: Device 0 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads = 5: 10 / 10 positions, visits/s = 514.89 nnEvals/s = 435.65 nnBatches/s = 175.07 avgBatchSize = 2.49 (15.6 secs) numSearchThreads = 12: 10 / 10 positions, visits/s = 755.29 nnEvals/s = 619.28 nnBatches/s = 104.42 avgBatchSize = 5.93 (10.7 secs) numSearchThreads = 10: 10 / 10 positions, visits/s = 708.05 nnEvals/s = 592.31 nnBatches/s = 119.93 avgBatchSize = 4.94 (11.4 secs) numSearchThreads = 20: 10 / 10 positions, visits/s = 841.51 nnEvals/s = 714.21 nnBatches/s = 73.17 avgBatchSize = 9.76 (9.7 secs) numSearchThreads = 16: 10 / 10 positions, visits/s = 802.27 nnEvals/s = 666.96 nnBatches/s = 85.31 avgBatchSize = 7.82 (10.1 secs) numSearchThreads = 24: 10 / 10 positions, visits/s = 849.15 nnEvals/s = 730.77 nnBatches/s = 62.66 avgBatchSize = 11.66 (9.7 secs) numSearchThreads = 32: 10 / 10 positions, visits/s = 839.41 nnEvals/s = 750.30 nnBatches/s = 48.50 avgBatchSize = 15.47 (9.9 secs)

Ordered summary of results:

numSearchThreads = 5: 10 / 10 positions, visits/s = 514.89 nnEvals/s = 435.65 nnBatches/s = 175.07 avgBatchSize = 2.49 (15.6 secs) (EloDiff baseline) numSearchThreads = 10: 10 / 10 positions, visits/s = 708.05 nnEvals/s = 592.31 nnBatches/s = 119.93 avgBatchSize = 4.94 (11.4 secs) (EloDiff +103) numSearchThreads = 12: 10 / 10 positions, visits/s = 755.29 nnEvals/s = 619.28 nnBatches/s = 104.42 avgBatchSize = 5.93 (10.7 secs) (EloDiff +122) numSearchThreads = 16: 10 / 10 positions, visits/s = 802.27 nnEvals/s = 666.96 nnBatches/s = 85.31 avgBatchSize = 7.82 (10.1 secs) (EloDiff +135) numSearchThreads = 20: 10 / 10 positions, visits/s = 841.51 nnEvals/s = 714.21 nnBatches/s = 73.17 avgBatchSize = 9.76 (9.7 secs) (EloDiff +143) numSearchThreads = 24: 10 / 10 positions, visits/s = 849.15 nnEvals/s = 730.77 nnBatches/s = 62.66 avgBatchSize = 11.66 (9.7 secs) (EloDiff +136) numSearchThreads = 32: 10 / 10 positions, visits/s = 839.41 nnEvals/s = 750.30 nnBatches/s = 48.50 avgBatchSize = 15.47 (9.9 secs) (EloDiff +110)

Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper. Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse). So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search: numSearchThreads = 5: (baseline) numSearchThreads = 10: +103 Elo numSearchThreads = 12: +122 Elo numSearchThreads = 16: +135 Elo numSearchThreads = 20: +143 Elo (recommended) numSearchThreads = 24: +136 Elo numSearchThreads = 32: +110 Elo

If you care about performance, you may want to edit numSearchThreads in b18_config.cfg based on the above results! If you intend to do much longer searches, configure the seconds per game move you expect with the '-time' flag and benchmark again. If you intend to do short or fixed-visit searches, use lower numSearchThreads for better strength, high threads will weaken strength. If interested see also other notes about performance and mem usage in the top of b18_config.cfg

2023-05-25 05:30:01+0900: GPU 0 finishing, processed 48401 rows 7890 batches

D:\baduk\katago-opencl-ex>


It comes out well in a clean folder.

mc-mong avatar May 24 '23 20:05 mc-mong

The b18_config.cfg file is the same.

mc-mong avatar May 24 '23 20:05 mc-mong

Thanks. So I am definitely not sure why there's an issue with your existing folder, but if it comes out well in a clean folder then it sounds like you have a workaround to the issue that solves the problem?

But if you want to investigate further, maybe try seeing if the difference is in the DLL files. Maybe there is a version difference between the DLL files between the two folders, or some other data in those folders?

lightvector avatar May 24 '23 20:05 lightvector

The lizzie file exists in an existing file. There are other files. There was no problem copying and using the katago 1.12.4 version to the existing file. Maybe there's a conflict in the 1.13.* version.

mc-mong avatar May 24 '23 20:05 mc-mong

Thank you.

mc-mong avatar May 24 '23 20:05 mc-mong