zhangbo0037

Results 10 comments of zhangbo0037

model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float32).to('cpu') can run on my mac... but anyone know how to use 'mps' ?

same issue 'response_model=TrainingResponse' in http.py set to 'response_model=None' can skip this issue ?

在 mac 上遇到 RuntimeError: Invalid buffer size: XXX GB 报错 (M4 Max 128GB 版) ...

> > 在 mac 上遇到 RuntimeError: Invalid buffer size: XXX GB 报错 (M4 Max 128GB 版) ... > > 我使用mps也遇到这个问题,但是指定cpu后它可以正常运行了 我觉得改成 cpu 是可以解决很多问题,但速度慢到爆。。。

I run these 4 nodes like below: **node 1:** docker run --gpus all \ -it \ --rm \ --name sglang_node_1 \ -v /data/deepseek-r1:/root/deepseek-r1 \ --env HF_ENDPOINT="https://hf-mirror.com" \ --env "GLOO_SOCKET_IFNAME=ens12f0np0" \...

I know H20 support 8bit weights, and A100 does not support 8bit(so need 16bit weight). I just want to know how to avoid Weight output_partition_size = 576 is not divisible...

> You want to use 4x8 H20 to run fp8 dsv3? For block fp8, it cannot support 32 GPU for the issue you met, and the max TP size is...

> it looks like you are using the official docker? yes,im using sglang docker

> try to reduce `--cuda-graph-max-bs=32` hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ... if i usie --cuda-graph-max-bs=16 , will hang at 20% ...

> ref https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h208-nodes https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h2008-nodes-and-docker > > Please try to use the latest image `docker pull lmsysorg/sglang:latest` and set `export NCCL_IB_GID_INDEX=3`. thank you!