star

Results 12 comments of star

config.json ![image](https://github.com/NVIDIA/TensorRT-LLM/assets/30221696/f0d5ef41-807d-41e6-b927-7f4c1b475721)

> Hi @SidaZh , we can disable paged kv cache when building engine, trtllm can chose faster fp8 gemm kernels with better opt_shape and get similar performance as benchmark.py. It...