KataGo
KataGo copied to clipboard
ONNXRuntime backend (WIP)
Introduction
This backend enables KataGo to run network files in .onnx format, which is an open standard for exchanging neural networks, with ONNXRuntime backend. Currently vesion 8 networks can be converted to .onnx by https://github.com/isty2e/KataGoONNX, and converted network files are available here.
The main motivation for running .onnx models is that new architectures or network details can be tested easily by exporting any trained model to .onnx, without implementing every detail in CUDA/OpenCL backend. For the time being, there is no advantage of using this backend for normal users: It will be slower by 10-20% than CUDA/OpenCL version for 19x19 boards, though it can be faster for smaller boards.
Execution providers
Currently, four execution providers are available for this backend:
- CUDA
- TensorRT - It is dreadfully slow for unknown reasons
- DirectML
- MIGraphX - I couldn't really test it, and there can be problems building
For windows systems, DirectML is considered the best in general. AMD cards can make use of MIGraphX, and ROCm execution provider can be supported once it is fully integrated in ONNXRuntime.
Building
GCC-9 or above is recommended for Linux systems.
- First of all, you need to download ONNXRuntime binary, or build ONNXRuntime by yourself. Considering that there is no merit of using TensorRT at this point, you can just download the binary.
- Then build KataGo as per usual, but with additional CMake flags:
-
ORT_LIB_DIR
: ONNXRuntime library location -
ORT_INCLUDE_DIR
: ONNXRuntime header file location -
ORT_CUDA
,ORT_TENSORRT
,ORT_DIRECTML
,ORT_MIGRAPHX
: Whether to support specific execution providers -
TENSORRT_LIB_DIR
,TENSORRT_INCLUDE_DIR
: Library and header file locations for TensorRT For example, if you want to build ONNXRuntime with CUDA and DirectML support, then your CMake configuration will be like this:
-
cmake -DUSE_BACKEND=ONNXRUNTIME -DORT_CUDA=1 -DORT_DIRECTML=1 -DORT_LIB_DIR=/foo/bar/lib -DORT_INCLUDE_DIR=/foo/bar/include
Configuration
There are be two options in .cfg file for ONNXRuntime backend: onnxOptModelFile
and onnxRuntimeExecutionProvider
. onnxOptModelFile
is the path for a cached graph-optimized onnx file. onnxRuntimeExecutionProvider
is one of the execution providers - CUDA
, TensorRT
, DirectML
, or MIGraphX
. These options can be properly set by running genconfig
.
For FP16 inference, you will need to use FP16 models instead of normal FP32 models. There is no advantage of FP16 inference for non-RTX cards, but you will want to use FP16 models instead for RTX cards.
TODO
- Somehow verify that MIGraphX version compiles and runs
- Maybe a cleaner
CMakeLists.txt
- Figure out why TensorRT execution provider is so slow
- Improve code quality in general
- For some cases, it might be the case that visit/s is not properly measured: NPS was constantly increasing so maybe initialization time should be compensated
- There can be some duplicated or unnecessary operations, so reduce overhead if any
TensorRT provider seems to reach normal speed after a few minutes of running (for optimization). I used TensorRT enabled onnxruntime. and delete SetOptimizedModelFilePath
I used the following libraries and options
CUDA 11.0 cuDNN 8.0.5 TensorRT-7.2.1.6
ORT_TENSORRT_ENGINE_CACHE_ENABLE=1 ORT_TENSORRT_FP16_ENABLE=1
isty2e
Hi, the onnx network file is deleted, could you upload it again? I'm looking forward to run katago in webpage by useing webonnxruntime, think you very much!