jiahanc comments

Repositories
Issues
Comments

Results 13 comments of


                                            jiahanc

[Bug]: Can't run Flashinfer MoE TRTLLM backend FP4 for Qwen3 235B

root causes of the issue 1. The flashinfer FP4 TRTLLM-GEN MOE originally only support 2 routing methods: DeepSeek_V3 and Llama4, so the routing is hardcoded to these 2. After https://github.com/vllm-project/vllm/pull/27492...

[Question] Modifying the Batch Scheduling Policy in the trtllm-bench CLI

Hi @byStander9 , Thanks for the question. If you change the scheduling policy by settting`exec_settings["settings_config"]["scheduler_policy"]` , it will change the scheduling policy during inference, because this is an initialization param...

[Question] Modifying the Batch Scheduling Policy in the trtllm-bench CLI

Hi @byStander9 , The requests' latency are recorded in this [request_latencies](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/bench/dataclasses/reporting.py#L84).