KataGo ONNXRuntime backend (WIP)

Introduction

This backend enables KataGo to run network files in .onnx format, which is an open standard for exchanging neural networks, with ONNXRuntime backend. Currently vesion 8 networks can be converted to .onnx by https://github.com/isty2e/KataGoONNX, and converted network files are available here.

The main motivation for running .onnx models is that new architectures or network details can be tested easily by exporting any trained model to .onnx, without implementing every detail in CUDA/OpenCL backend. For the time being, there is no advantage of using this backend for normal users: It will be slower by 10-20% than CUDA/OpenCL version for 19x19 boards, though it can be faster for smaller boards.

Execution providers

Currently, four execution providers are available for this backend:

CUDA
TensorRT - It is dreadfully slow for unknown reasons
DirectML
MIGraphX - I couldn't really test it, and there can be problems building

For windows systems, DirectML is considered the best in general. AMD cards can make use of MIGraphX, and ROCm execution provider can be supported once it is fully integrated in ONNXRuntime.

Building

GCC-9 or above is recommended for Linux systems.

First of all, you need to download ONNXRuntime binary, or build ONNXRuntime by yourself. Considering that there is no merit of using TensorRT at this point, you can just download the binary.
Then build KataGo as per usual, but with additional CMake flags:
- ORT_LIB_DIR: ONNXRuntime library location
- ORT_INCLUDE_DIR: ONNXRuntime header file location
- ORT_CUDA, ORT_TENSORRT, ORT_DIRECTML, ORT_MIGRAPHX: Whether to support specific execution providers
- TENSORRT_LIB_DIR, TENSORRT_INCLUDE_DIR: Library and header file locations for TensorRT For example, if you want to build ONNXRuntime with CUDA and DirectML support, then your CMake configuration will be like this:

cmake -DUSE_BACKEND=ONNXRUNTIME -DORT_CUDA=1 -DORT_DIRECTML=1 -DORT_LIB_DIR=/foo/bar/lib -DORT_INCLUDE_DIR=/foo/bar/include

Configuration

There are be two options in .cfg file for ONNXRuntime backend: onnxOptModelFile and onnxRuntimeExecutionProvider. onnxOptModelFile is the path for a cached graph-optimized onnx file. onnxRuntimeExecutionProvider is one of the execution providers - CUDA, TensorRT, DirectML, or MIGraphX. These options can be properly set by running genconfig.

For FP16 inference, you will need to use FP16 models instead of normal FP32 models. There is no advantage of FP16 inference for non-RTX cards, but you will want to use FP16 models instead for RTX cards.

TODO

Somehow verify that MIGraphX version compiles and runs
Maybe a cleaner CMakeLists.txt
Figure out why TensorRT execution provider is so slow
Improve code quality in general
For some cases, it might be the case that visit/s is not properly measured: NPS was constantly increasing so maybe initialization time should be compensated
There can be some duplicated or unnecessary operations, so reduce overhead if any

Nov 08 '20 06:11 isty2e

TensorRT provider seems to reach normal speed after a few minutes of running (for optimization). I used TensorRT enabled onnxruntime. and delete SetOptimizedModelFilePath

I used the following libraries and options

CUDA 11.0 cuDNN 8.0.5 TensorRT-7.2.1.6

ORT_TENSORRT_ENGINE_CACHE_ENABLE=1 ORT_TENSORRT_FP16_ENABLE=1

Dec 25 '20 13:12 lp200

isty2e

Hi, the onnx network file is deleted, could you upload it again? I'm looking forward to run katago in webpage by useing webonnxruntime, think you very much!

Jul 24 '24 09:07 ailuoku6

KataGo KataGo copied to clipboard

ONNXRuntime backend (WIP)

Introduction

Execution providers

Building

Configuration

TODO

KataGo
KataGo copied to clipboard