Kanghwan

Results 69 comments of Kanghwan

Just for the record, it happened when I use LLVM compilers.

This is quite an old issue, but I wanted to share, just for your information, that I’ve confirmed these two commands now work without triggering the previously reported problem. ```shell...

The documented moved to https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/llm_ptq/README.md I'll mark this as "waiting for feedback" so it can be automatically marked as stale if no feedback is received within 14 days. Simply leaving...

@bleedingfight , thank you for the update. Just to confirm my understanding: after using vLLM with AutoAWQ and the same models, you’re no longer seeing the issue you reported with...

Thanks for confirming it!

@PerkzZheng , If you still remember, could you please share any test results you may have from evaluating this optimization? @lishicheng1996 , How would you like to proceed with this...

@PerkzZheng, Thank you for the update! Could you please confirm whether the failed tests were indeed due to **numerical errors** rather than actual accuracy issues? If the `SCALE_QP_INSTEAD_OF_KV` approach should...

@handoku Thanks for reporting this MoE kernel assertion issue with Qwen2 MoE 57B-A14B! And sorry about the very delayed response. Are you still exploring this issue or experiencing the MoE...

@lss15151161 , Thank you for raising this question about batch inference in LLaVA! And I'm sorry for the very delayed response. If you are still interested in the batch inference,...

@LetsGoFir , thanks for your contribution, this PR looks still valid 👍 However could you address above DCO check failure, more details are available here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/CONTRIBUTING.md#signing-your-work