Ma, Guokai comments

Results 180 comments of


                                            Ma, Guokai

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely

Yes, the latest version can going forward. Will see whether it can continue. > @delock, here's the PR fixing the `tokens_per_sec` metric to work for both the streaming and non-streaming...

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely

Hi @lekurile the benchmark will proceed but will hit some other error when running on CPU. I'll check with vllm cpu engineers to investigate these errors. I also submitted a...

run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely

> Thanks @delock - can we close this issue for now? Yes, this is no longer an issue now, thanks!

Failed to install Fused_adam op on CPU

Hi @daehuikim I use the following command and can see cuda op status. Note I don't have CUDA toolchain installed. Is your environment have CUDA toolchain you should be able...

[BUG]Install deepspeed on the npu machine, and an error is reported during verification

Hi @xuedinge233 can you take a look at this issue?

[BUG] Qwen3: model loading failed when using meta device

I met error as well on CPU device. Qwen3 meta tensor loading may not be supported yet. ``` [rank5]: raise NotImplementedError( [rank5]: NotImplementedError: Cannot copy out of meta tensor; no...

[BUG] Qwen3: model loading failed when using meta device

Hi @songdezhao can you check whether this branch fix your issue? https://github.com/deepspeedai/DeepSpeed/tree/gma/enable_qwen3_meta

[BUG] Qwen3: model loading failed when using meta device

> Hello, I am loading a Qwen2.5-72B, and I hit the same error. ANy help? > > torch==2.6.0 > torchvision==0.21.0 > torchaudio==2.6.0 > datasets==3.0.0 > huggingface-hub==0.30.0 > transformers==4.52.4 > accelerate==1.7.0...

[BUG] Deepspeed-Inference: support AutoTP for Llama-4 models

Hi @ranzhejiang llama4 had not been supported by AutoTP. From the error message there seem to have a key mismatch. @songdezhao Do you have a dump of module structure?

sequence parallel for uneven heads

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested...