Kanghwan comments

Results 69 comments of


                                            Kanghwan

AimetTensorQuantizer argument error during testing example code.

Just for the record, it happened when I use LLVM compilers.

AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

This is quite an old issue, but I wanted to share, just for your information, that I’ve confirmed these two commands now work without triggering the previously reported problem. ```shell...

llama awq4 result is wrong!

The documented moved to https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/llm_ptq/README.md I'll mark this as "waiting for feedback" so it can be automatically marked as stale if no feedback is received within 14 days. Simply leaving...

llama awq4 result is wrong!

@bleedingfight , thank you for the update. Just to confirm my understanding: after using vLLM with AutoAWQ and the same models, you’re no longer seeing the issue you reported with...

llama awq4 result is wrong!

Thanks for confirming it!

decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…

@PerkzZheng , If you still remember, could you please share any test results you may have from evaluating this optimization? @lishicheng1996 , How would you like to proceed with this...

decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…

@PerkzZheng, Thank you for the update! Could you please confirm whether the failed tests were indeed due to **numerical errors** rather than actual accuracy issues? If the `SCALE_QP_INSTEAD_OF_KV` approach should...

moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled

@handoku Thanks for reporting this MoE kernel assertion issue with Qwen2 MoE 57B-A14B! And sorry about the very delayed response. Are you still exploring this issue or experiencing the MoE...

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

@lss15151161 , Thank you for raising this question about batch inference in LLaVA! And I'm sorry for the very delayed response. If you are still interested in the batch inference,...

chore: Only read cfg json once

@LetsGoFir , thanks for your contribution, this PR looks still valid 👍 However could you address above DCO check failure, more details are available here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/CONTRIBUTING.md#signing-your-work