lizzieyzy
lizzieyzy copied to clipboard
Regarding this release (2.5.2)
First of all, thank you for your continued development. Here are a few things we noticed about this release.
- Among the engines registered by default, "(OpenCL) Kata1-18B" causes an error. OpenCL tuning is executed, but immediately after that, an error occurred and it could not be started. I didn't see any error messages in the GTP console. The same is true when running from the command line.
- According to @hyln9, the developer of the TensorRT version, this TensorRT version does not require CUDA and cuDNN dll files.
Also, in relation to TensorRT, only two "nvinfer.dll" and "nvinfer_builder_resource.dll" are required, and the others are unnecessary.
I was able to start it even after excluding unnecessary files.
The files shown below are unnecessary.
- I heard that the sharing function is no longer available, but it is still available in previous versions.
For the opencl issue,can you replace the katago.exe in katago_opencl folder by the katago.exe in katago-v1.12.3-opencl-windows-x64 folder,and try again see if the error still exist.
For the TensorRT files ,I wll try it later,I'm not sure if a computer without any cuda or cudnn be installed could work with only "nvinfer.dll" and "nvinfer_builder_resource.dll".
The sharing function is on a Cloud Server, and it is expired on 2023-02-01,maybe it will still work for one or two weeks before it is recycled.
I tried putting the katago.exe from the katago-v1.12.3-opencl-windows-x64 folder in the katago_opencl folder, but I'm still getting the error. In the 2022-05-13-windows64 folder, I put the jar file compiled from the Lizziezy main branch earlier. I installed the katago-v1.12.3-opencl folder and registered the engine. It's working fine. I copied this jar file and katago-v1.12.3-opencl folder to the 2023-01-30-windows64+katago folder and did a start test, but an error occurred here. I tried replacing config.txt in the 2022-05-13-windows64 folder with the 2023-01-30-windows64+katago folder, but this also resulted in an error.
I've tried many things, but so far I haven't been able to figure out the cause of the error.
When I usually use LizzieYzy, I use the folder of the version (2.5.0) released on May 13th last year. Every time the GitHub source code is updated, we compile it and always keep the startup jar file up to date. If I bring the 2023-01-30-windows64+katago folder's katago_opencl folder and lizzie-yzy2.5.2-shaded.jar to this folder, it works fine. I tried replacing the config.txt and jre folders, but I still don't know the cause of the error.
I'm glad if you can use it as a reference. About OpenCL.dll
- OpenCL.dll is installed with GPU drivers for each GPU vendor (NVIDIA, AMD, Intel).
- You can install his OpenCL.dll by getting Khronos-SDK from his original Khronos, but you need to build it from the source yourself according to each environment. (https://github.com/KhronosGroup/OpenCL-SDK)
- Regarding the operation of OpenCL, as shown below, in reality, it only distributes to the GPU driver in the DriverStore folder of each vendor, and GPU processing is performed there. Below is an example of how the NVIDIA driver works.
- It may not work well if some kind of conflict occurs between OpenCL.dll and GPU driver.
- By the way, OpenCL.dll bundled with KataGo are: -- File version 2.2.2.0 -- Size 105KB -- Updated on 2022/04/06
- Installed with the NVIDIA drivers are: -- File version 3.0.3.0 -- Size 1.41MB -- Update date 2023/01/21 8:20
- The one built from Khronos-SDK are: -- File version 3.0.1.0 -- Size 0.98MB -- Update date 2022/05/03 23:41
- As far as OpenCL.dll is concerned, I think it's safe to use the one for each environment instead of the bundled one.
'katago.exe' (Win32): 'C:\Windows\System32\OpenCL.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\nvhmsi.inf_amd64_f8e5d8fd9ccf16c0\nvopencl64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_d8bdffa26077ee9a\igdrcl64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_d8bdffa26077ee9a\igdgmm64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_d8bdffa26077ee9a\igdfcl64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_d8bdffa26077ee9a\igc64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\nvhmsi.inf_amd64_f8e5d8fd9ccf16c0\nvvm64.dll' was loaded.
'katago.exe' (Win32): 'C:\Windows\System32\DriverStore\FileRepository\nvhmsi.inf_amd64_f8e5d8fd9ccf16c0\nvptxJitCompiler64.dll' was loaded.
About TensorRT
- Below is the debug log for TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.
- cuda, cudnn dlls are used from nvinfer_builder_resource.dll.
'katago.exe' (Win32): 'katago.exe' was loaded.
'nvinfer.dll' was loaded.
'nvinfer_builder_resource.dll' was loaded.
'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cublasLt64_11.dll' was loaded.
'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvrtc64_112_0.dll' was loaded.
'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cublas64_11.dll' was loaded.
'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn64_8.dll' was loaded.
'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn_ops_infer64_8.dll' was loaded.
Thank you for providing the information. Building opencl is a bit difficult for me, but I would like to work on it if possible. https://jp.dll-files.com/opencl.dll.html I've tried the OpenCL.dll found here, but unfortunately none had any effect. The NVIDIA driver was a bit old, so I updated it to the latest, but I'm still getting the error.
In the first place, even the OpenCL.dll shipped with katago works on my computer. It works on the 2.5.0 folder, but it doesn't work on the 2.5.2 folder. I'd like to know the reason for this, but I haven't figured it out yet...
About TensorRT
- Below is the debug log for TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.
- cuda, cudnn dlls are used from nvinfer_builder_resource.dll.
'katago.exe' (Win32): 'katago.exe' was loaded. 'nvinfer.dll' was loaded. 'nvinfer_builder_resource.dll' was loaded. 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cublasLt64_11.dll' was loaded. 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvrtc64_112_0.dll' was loaded. 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cublas64_11.dll' was loaded. 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn64_8.dll' was loaded. 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn_ops_infer64_8.dll' was loaded.
CUDA/cuDNN dlls were loaded because they existed, not because they were actually used. They can be safely removed as of KataGo v1.12.2.
About TensorRT
I understand about not using cuda and cudnn.
About OpenCL
Possible causes and countermeasures
- Inconsistent between resources when building: building from source -- include -- OpenCL.lib -- OpenCL.dll
- Inconsistent between OpenCL.dll and GPU driver: Delete or rename OpenCL.dll from KataGo folder
- KataGo chooses different GPU than planned: Change GPU index in configuration definition
- Inconsistent tuning data: Recreate tuning file
- GPU overheated: Stop PC for a while
- GPU failed: Replace GPU
- GPU driver bug: downgrad to version when it was working fine
It seems likely that the cause of the error is OpenCL tuning.
I copied the following four items in the 2023-01-30-windows64+katago folder to the 2022-05-13-windows64 folder and tuned OpenCL. Then it works fine here. If you copy the tuning file created here to the 2023-01-30-windows64+katago folder, it will work properly here as well.
- katago_opencl folder
- katago_configs folder
- lizzie-yzy2.5.2-shaded.jar
- b18c384nbt-uec.bin.gz
Last time, I said that katago-v1.12.3-opencl-windows-x64 also had an error, but after verifying it again, it worked properly. 2023-01-30-An error occurs only when running katago_opencl in the windows64+katago folder.
I think there is something wrong with the contents of the tuning file, but I'm not sure. Upload the file that gives the error and the file that works fine. opcl-6558KB-ng.zip opcl-6558KB-ok.zip
Thanks for your help @hyln9 and @MAOmao000. About OpenCL.dll ,I just copied form official release,and I kind of remember some old machine need it, not for sue, but I think it is safe to keep it since it it from offcial release.For tenserRT backend,I will remove all cuda related dlls and leave only 'nvinfer.dll' and 'nvinfer_builder_resource.dll' .
If it's a tuner issue, the following issues is helpful.
https://github.com/lightvector/KataGo/issues/733
https://github.com/lightvector/KataGo/issues/746