Kanghwan comments

Results 70 comments of


                                            Kanghwan

Building T5 with `--debug_mode` flag causes it to not run successfully

Closing this issue as stale. If the problem persists in the latest release, please feel free to open a new one. Thank you!

int8_kv_cache scale

@liquanfeng , Thanks for catching that! Although the code is still present [here](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/llama/convert.py#L641-L649), it appears to be specific to the old TensorRT backend. The PyTorch backend, which is the preferred...

Inquiry on 0.18 Release plan and R1 TRT engine support in different branch

I’m closing this issue due to its prolonged inactivity. I hope the comments above have addressed the questions. If the issue still exists in the latest release, please open a...

Context node crash when using PD Disaggregation

@nsealati , Just checking in~, if this issue is no longer relevant, please let me know so we can close it. If it is still affecting you, could you try...

Context node crash when using PD Disaggregation

Closing the issue as stale. Please feel free to open a new issue if the problem persists with the latest release. Thank you!

How to achieve 253 tok/sec with DeepSeek-R1-FP4 on 8xB200

@yubofredwang , I hope you’ve found the information you were looking for, but I’m sharing a more recent one: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release

Executor API: How to get throughput

Due to the issue’s prolonged inactivity, I’m closing it. I hope the comments above have addressed the question. If the problem persists in the latest release, please open a new...

InternLM2 encounters a error when the batch size exceeds 16

Closing this issue as stale. If this issue is still relevant to you, please try running the model with the latest release. Also, consider switching to PyTorch workflow, which is...

[Model Requests] Support Qwen2.5-VL Architecture

Apologies for the delayed response. The Qwen2.5 VL model is supported by the PyTorch backend and can be found in the [Multimodal Feature Support Matrix (PyTorch Backend)](https://nvidia.github.io/TensorRT-LLM/models/supported-models.html).

enable_kv_cache_reuse don't work on Qwen L40

Closing this issue based on discussions above. Please feel free to open a new one if the problem persists in the latest release. Thank you!