Canlin Guo
Canlin Guo
> Any progress on this? Yes. I'm updating it today.
@Isotr0py Many thanks!
1. I have rebased code and extract common method into `OmniBase` class. 2. Offline profiler API has been implemented. But because the flag `--profiler-config` is introduced by https://github.com/vllm-project/vllm/pull/29912 which isn't...
> fix ci please Sorry for being late and thanks for the review. I’ll work through the issues below ASAP: 1. CUDA time total seems to be not accurate. 2....
I tested vllm's profiler with model `Qwen/Qwen2.5-Omni-7B`(only thinker) and get the below results, which can explain why CUDA time is much smaller than CPU time. So yes, it's accurate because...
Thanks for investigating @lishunyang12. IMO, it's hard not to trace shm_boardcast.py:dequeue if we want to reuse vLLM's profiler. Even if the trace file is so large(~70MB), current profiler can still...