conway-abacus
conway-abacus
I was having trouble reproing the int8 speedup. didn't look into the generated code to verify, but turns out I needed the following ``` import torch._inductor.config torch._inductor.config.coordinate_descent_tuning = True ```...
hey @Chillee any tips for generating fused kernel for BS > 1? is it related at all to https://github.com/pytorch/pytorch/issues/127056
Thanks @byshiue do you mean the docker used to build the engine? ``` >>> import tensorrt >>> tensorrt.__version__ '9.3.0.post12.dev1' ``` I was following the guide, should I try to downgrad/rebuild...