nv-guomingz

Results 27 comments of nv-guomingz

we've supported llama with the latest release, please have a try.

The contextFMHA doesn't support geforce 2080Ti whichi sm version is sm75. https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp#L91. You may try to build the engine by disabling the context_fmha `--context_fmha disable` or on other supported hardware.

Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.

> > Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch...

Hi @RunningLeon I've managed to file the merge request in our internal repo and testing is on-going. If everything goes well, this MR would be upstreamed next week. Thanks for...

@RunningLeon Internlm2 had been added into today's update. Please see notes here. https://github.com/NVIDIA/TensorRT-LLM/discussions/1726#discussion-6776859