nv-guomingz comments

Results 27 comments of


                                            nv-guomingz

Llama 3 build error

we've supported llama with the latest release, please have a try.

Assertion failed: Unsupported architecture (/tensorrt_llm/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp:89)

The contextFMHA doesn't support geforce 2080Ti whichi sm version is sm75. https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp#L91. You may try to build the engine by disabling the context_fmha `--context_fmha disable` or on other supported hardware.

Support internlm2

Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.

Support internlm2

> > Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch...

Support internlm2

Hi @RunningLeon I've managed to file the merge request in our internal repo and testing is on-going. If everything goes well, this MR would be upstreamed next week. Thanks for...

Support internlm2

@RunningLeon Internlm2 had been added into today's update. Please see notes here. https://github.com/NVIDIA/TensorRT-LLM/discussions/1726#discussion-6776859

Adding debug options to trtllm-build to visualize the TRT Network before Engine build

Close it since we've merged the changes.