jiahanc
jiahanc
Hi @dongjiyingdjy I think the issue also exists when MTP =2, can we change the title to make it more clear? Thanks :)
Hi @ghostplant , The 150 TPS is with MTP = 3. We have a PR to document the reproduction steps on both Hopper and Blackwell: https://github.com/NVIDIA/TensorRT-LLM/pull/3232
/bot run
/bot run
/bot run