zhangbo0037 comments

Results 10 comments of


                                            zhangbo0037

Placeholder storage has not been allocated on MPS device - Mac M2

model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float32).to('cpu') can run on my mac... but anyone know how to use 'mps' ?

locally call train interface failed in cog-sdxl

same issue 'response_model=TrainingResponse' in http.py set to 'response_model=None' can skip this issue ?

MAC下运行报错

在 mac 上遇到 RuntimeError: Invalid buffer size: XXX GB 报错 (M4 Max 128GB 版) ...

MAC下运行报错

> > 在 mac 上遇到 RuntimeError: Invalid buffer size: XXX GB 报错 (M4 Max 128GB 版) ... > > 我使用mps也遇到这个问题，但是指定cpu后它可以正常运行了我觉得改成 cpu 是可以解决很多问题，但速度慢到爆。。。

KeyError: 'model.layers.28.mlp.experts.w2_weight_scale_inv' in 4*H20

I run these 4 nodes like below： **node 1：** docker run --gpus all \ -it \ --rm \ --name sglang_node_1 \ -v /data/deepseek-r1:/root/deepseek-r1 \ --env HF_ENDPOINT="https://hf-mirror.com" \ --env "GLOO_SOCKET_IFNAME=ens12f0np0" \...

KeyError: 'model.layers.28.mlp.experts.w2_weight_scale_inv' in 4*H20

I know H20 support 8bit weights, and A100 does not support 8bit(so need 16bit weight). I just want to know how to avoid Weight output_partition_size = 576 is not divisible...

KeyError: 'model.layers.28.mlp.experts.w2_weight_scale_inv' in 4*H20

> You want to use 4x8 H20 to run fp8 dsv3? For block fp8, it cannot support 32 GPU for the issue you met, and the max TP size is...

Hang at around 4% during the CUDA graph loading process

> it looks like you are using the official docker? yes，im using sglang docker

Hang at around 4% during the CUDA graph loading process

> try to reduce `--cuda-graph-max-bs=32` hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ... if i usie --cuda-graph-max-bs=16 , will hang at 20% ...

how sglang utilizes the InfiniBand? Are there any relevant documents or guides? Thanks

> ref https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h208-nodes https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h2008-nodes-and-docker > > Please try to use the latest image `docker pull lmsysorg/sglang:latest` and set `export NCCL_IB_GID_INDEX=3`. thank you！