laixin comments

Results 8 comments of


                                            laixin

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

> @HandH1998 @laixinn Cannot support torch-compile? When I enable torch-compile, the returned result is garbled characters. Like this: > > ``` > {"id":"2fe19ce57cdb4613bf5e1b718d21ae8b","object":"chat.completion","created":1740622831,"model":"ds3","choices":[{"index":0,"message":{"role":"assistant","content":"�-se-se goodπππ goodπ good goodππ goodπ goodπ goodππ...

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

> > @lambert0312 please provide detailed configuration about this result and try launch without torch-compile to ensure everything else is good. > > @laixinn I start the 2 node using...

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

> @laixinn I deployed the model service on 2 H20s. After deploying according to the command you showed, an error message was displayed when requesting the API. The error info...

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

@noob-ctrl Yes, we did simple accuracy checks akin to the BF16 case and still, we observed no accuracy loss.

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

> two node with A100 80g using commands following: `python -m sglang.launch_server --model /home/wanglch/models/DeepSeek-R1-INT8 --tp 16 --dist-init-addr [192.168.33.121:3000](http://192.168.33.121:3000/) --nnodes 2 --node-rank 0 --trust-remote --enable-torch-compile --torch-compile-max-bs 8 python -m sglang.launch_server --model...