KataGo icon indicating copy to clipboard operation
KataGo copied to clipboard

How to optimize TensorRT for Katago training contribution?

Open yauwing opened this issue 4 years ago • 4 comments

I upgraded to TensorRT and is happy that Katago is now 60+% faster on my computer.

However, when I try to contribute on katagotraining.org I don't see the same jump in training performance - actually it ends up slightly slower because TensorRT takes a longer time to initialize.

Is there anything I can do to improve the training performance?

yauwing avatar Nov 01 '21 15:11 yauwing

As far as I understood things, TensorRT is mainly faster is when requireMaxBoardSize = true in config is set. This is (atm) not possible for training, because different board sizes are used.

@tychota, @lightvector Might it be possible to distribute boardsize with tasks from server, so to have fixed board size per distributer?

petgo3 avatar Nov 01 '21 15:11 petgo3

I upgraded to TensorRT and is happy that Katago is now 60+% faster on my computer.

However, when I try to contribute on katagotraining.org I don't see the same jump in training performance - actually it ends up slightly slower because TensorRT takes a longer time to initialize.

Is there anything I can do to improve the training performance?

what is your configs settings, TRT may be just 20% faster on my pc

ceremony08 avatar Nov 02 '21 13:11 ceremony08

My computer hardware config is AMD Ryzen 5950, 64G RAM, RTX3080ti + RTX2080 Config file was generated using the katago genconfig command Since config file generated is a bit long, I only list config settings difference from default_gtp.cfg below: numSearchThreads = 96 nnCacheSizePowerOfTwo = 21 nnMutexPoolSizePowerOfTwo = 17 numNNServerThreadsPerModel = 2 trtDeviceToUseThread0 = 0 trtDeviceToUseThread1 = 1

yauwing avatar Nov 02 '21 14:11 yauwing

My computer hardware config is AMD Ryzen 5950, 64G RAM, RTX3080ti + RTX2080 Config file was generated using the katago genconfig command Since config file generated is a bit long, I only list config settings difference from default_gtp.cfg below: numSearchThreads = 96 nnCacheSizePowerOfTwo = 21 nnMutexPoolSizePowerOfTwo = 17 numNNServerThreadsPerModel = 2 trtDeviceToUseThread0 = 0 trtDeviceToUseThread1 = 1

:)

ceremony08 avatar Nov 04 '21 10:11 ceremony08