Kefeng-Duan comments

Results 25 comments of


                                            Kefeng-Duan

torch.OutOfMemoryError when try to build TensorRT engines for Qwen2-72B(-Instruct)

Hi, @sdecoder Could you try to use --load_model_on_cpu?

torch.OutOfMemoryError when try to build TensorRT engines for Qwen2-72B(-Instruct)

@sdecoder Do you mean the weights are too big to be stored in on GPU (26GB > 24GB), so you need to offload some (or all ) weights to CPU?...

AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

How about referring this one ?: https://github.com/NVIDIA/TensorRT-LLM/issues/1968#issuecomment-2252750163

AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

@BooHwang https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#run-llama-with-streamingllm sorry, could you try --streamingllm enable when building the engine?

Llama 3 70B FP8 engine build failed with FMHA

Hi, @ayush1399 it seems a version mismatched issue, could you: 1. update to the latest commit 2. install the latest pypi 3. clean and rebuild trtllm 4. rebuild the engine

How to add gemm_plugin int8

Hi, @xiangxinhello , Could you help to provide your /tmp/Qwen/7B/config.json file?

How to add gemm_plugin int8

@nv-guomingz for vis

Cannot support long context input vs vllm.

Hi, @zhaocc1106 , could you update to the latest trtllm version?

Cannot support long context input vs vllm.

@zhaocc1106 Could you double check that you have successfully rebuilded and reinstalled the v0.11.0, I think we have remove '--use_custom_all_reduce' knob from build flow and you will get an error...

Cannot support long context input vs vllm.

@zhaocc1106 could you try to enable --context_fmha?