Ma, Guokai

Results 180 comments of Ma, Guokai

Yes, the latest version can going forward. Will see whether it can continue. > @delock, here's the PR fixing the `tokens_per_sec` metric to work for both the streaming and non-streaming...

Hi @lekurile the benchmark will proceed but will hit some other error when running on CPU. I'll check with vllm cpu engineers to investigate these errors. I also submitted a...

> Thanks @delock - can we close this issue for now? Yes, this is no longer an issue now, thanks!

Hi @daehuikim I use the following command and can see cuda op status. Note I don't have CUDA toolchain installed. Is your environment have CUDA toolchain you should be able...

I met error as well on CPU device. Qwen3 meta tensor loading may not be supported yet. ``` [rank5]: raise NotImplementedError( [rank5]: NotImplementedError: Cannot copy out of meta tensor; no...

Hi @songdezhao can you check whether this branch fix your issue? https://github.com/deepspeedai/DeepSpeed/tree/gma/enable_qwen3_meta

> Hello, I am loading a Qwen2.5-72B, and I hit the same error. ANy help? > > torch==2.6.0 > torchvision==0.21.0 > torchaudio==2.6.0 > datasets==3.0.0 > huggingface-hub==0.30.0 > transformers==4.52.4 > accelerate==1.7.0...

Hi @ranzhejiang llama4 had not been supported by AutoTP. From the error message there seem to have a key mismatch. @songdezhao Do you have a dump of module structure?

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested...