KataGo icon indicating copy to clipboard operation
KataGo copied to clipboard

Drastic reduction in trt plan cache size

Open hyln9 opened this issue 1 year ago • 4 comments
trafficstars

Hello!

As NVIDIA has finally released TensorRT 10.0 and made it publicly available on their website, I did some research on the now improved engine refitting API.

The result is very promising and the size of the plan cache is reduced by ~30x on my laptop. Support for the newer CUDA 12.x has been added as well.

hyln9 avatar Jun 08 '24 21:06 hyln9

Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.

inisis avatar Jun 13 '24 08:06 inisis

Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.

The kSTRIP_PLAN flag enables weight-stripping and works well with refitting at runtime.

hyln9 avatar Jun 13 '24 12:06 hyln9

Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?

ActiveIce avatar Jun 18 '24 06:06 ActiveIce

Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?

It should be fixed now.

hyln9 avatar Jun 19 '24 11:06 hyln9