nv-guomingz
nv-guomingz
we've supported llama with the latest release, please have a try.
The contextFMHA doesn't support geforce 2080Ti whichi sm version is sm75. https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp#L91. You may try to build the engine by disabling the context_fmha `--context_fmha disable` or on other supported hardware.
Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.
> > Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch...
Hi @RunningLeon I've managed to file the merge request in our internal repo and testing is on-going. If everything goes well, this MR would be upstreamed next week. Thanks for...
@RunningLeon Internlm2 had been added into today's update. Please see notes here. https://github.com/NVIDIA/TensorRT-LLM/discussions/1726#discussion-6776859
Close it since we've merged the changes.