jiahanc
Results
2
issues of
jiahanc
During experiment, during cuda graph capture, the graph size oscillates frequently, making total size of graph larger than expected. Reverse the order of graph batch size when capturing and this...
## Purpose Fixes https://github.com/vllm-project/vllm/pull/28007 - Add multi routing method to flashinfer fp4 trtllm moe to support models like Qwen3 - Add flashinfer trtllm moe into global_sf list which was missed...
performance
frontend
ready
nvidia