CrazyAra
CrazyAra copied to clipboard
cuDNN library not fully used on Windows
The library cudnn_cnn_infer64_8.dll
is not used on Windows, but libcudnn_cnn_infer.so.8
is used on Linux.
This seems to make a visible NPS difference.
e.g. Ubuntu 18.04:
GPU: RTX 2070 OC
isready
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx
info string deserialize engine: model/chess/model-bsize1-fp16-0.trt
info string inputDims: (1, 39, 8, 8)
info string valueOutputDims: (1, 1)
info string policyOutputDims: (1, 4864)
info string No auxiliary outputs detected.
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string inputDims: (16, 39, 8, 8)
info string valueOutputDims: (16, 1)
info string policyOutputDims: (16, 4864)
info string No auxiliary outputs detected.
readyok
go infinite
info string create new tree
info string run mcts search
info depth 17 seldepth 28 multipv 1 score cp 47 nodes 18522 nps 18485 tbhits 0 time 1002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2
info depth 19 seldepth 31 multipv 1 score cp 47 nodes 38347 nps 19154 tbhits 0 time 2002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
info depth 19 seldepth 37 multipv 1 score cp 47 nodes 57007 nps 18990 tbhits 0 time 3002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
GPU-Utility: 91%
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... Off | 00000000:01:00.0 Off | N/A |
| 0% 48C P2 152W / 215W | 677MiB / 7982MiB | 91% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:0B:00.0 On | N/A |
| 0% 54C P2 56W / 250W | 441MiB / 11177MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
e.g. Windows 10:
GPU: RTX 2070 OC
isready
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx
info string deserialize engine: model/chess/model-bsize1-fp16-0.trt
info string inputDims: (1, 39, 8, 8)
info string valueOutputDims: (1, 1)
info string policyOutputDims: (1, 4864)
info string No auxiliary outputs detected.
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string inputDims: (16, 39, 8, 8)
info string valueOutputDims: (16, 1)
info string policyOutputDims: (16, 4864)
info string No auxiliary outputs detected.
readyok
go infinite
info string create new tree
info string run mcts search
info depth 17 seldepth 28 multipv 1 score cp 47 nodes 16500 nps 16369 tbhits 0 time 1008 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2
info depth 19 seldepth 31 multipv 1 score cp 47 nodes 33367 nps 16584 tbhits 0 time 2012 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
info depth 19 seldepth 33 multipv 1 score cp 47 nodes 50400 nps 16617 tbhits 0 time 3033 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
GPU-Utility: 85%
C:\Windows\System32\DriverStore\FileRepository\nv_dispui.inf_amd64_c1f8f32cc9af9677>nvidia-smi
Fri Apr 9 17:37:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.33 Driver Version: 461.33 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... WDDM | 00000000:01:00.0 Off | N/A |
| 29% 59C P2 141W / 215W | 845MiB / 8192MiB | 85% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... WDDM | 00000000:0B:00.0 On | N/A |
| 0% 38C P8 17W / 250W | 692MiB / 11264MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I was able to link the binary to cudnn_cnn_infer64_8.dll
but this didn't seem to help unfortunately.
Also adding certain optimization options such as /O2 (Maximize Speed), /GL (Whole program optimization), /LTCG (Link-time code generation) didn't result in a NPS improvement.