Canlin Guo comments

Results 56 comments of


                                            Canlin Guo

[Perf] Use vLLM's SharedFusedMoE in Qwen3-Omni

> Any progress on this? Yes. I'm updating it today.

[Perf] Use vLLM's SharedFusedMoE in Qwen3-Omni

@Isotr0py Many thanks!

[Feature] Support torch profiler across omni stages

1. I have rebased code and extract common method into `OmniBase` class. 2. Offline profiler API has been implemented. But because the flag `--profiler-config` is introduced by https://github.com/vllm-project/vllm/pull/29912 which isn't...

[Feature] Support torch profiler across omni stages

> fix ci please Sorry for being late and thanks for the review. I’ll work through the issues below ASAP: 1. CUDA time total seems to be not accurate. 2....

[Feature] Support torch profiler across omni stages

I tested vllm's profiler with model `Qwen/Qwen2.5-Omni-7B`(only thinker) and get the below results, which can explain why CUDA time is much smaller than CPU time. So yes, it's accurate because...

[Feature] Support torch profiler across omni stages

Thanks for investigating @lishunyang12. IMO, it's hard not to trace shm_boardcast.py:dequeue if we want to reuse vLLM's profiler. Even if the trace file is so large(~70MB), current profiler can still...