Results 8 comments of laixin

> @HandH1998 @laixinn Cannot support torch-compile? When I enable torch-compile, the returned result is garbled characters. Like this: > > ``` > {"id":"2fe19ce57cdb4613bf5e1b718d21ae8b","object":"chat.completion","created":1740622831,"model":"ds3","choices":[{"index":0,"message":{"role":"assistant","content":"�-se-se goodπππ goodπ good goodππ goodπ goodπ goodππ...

> > @lambert0312 please provide detailed configuration about this result and try launch without torch-compile to ensure everything else is good. > > @laixinn I start the 2 node using...

> @laixinn I deployed the model service on 2 H20s. After deploying according to the command you showed, an error message was displayed when requesting the API. The error info...

@noob-ctrl Yes, we did simple accuracy checks akin to the BF16 case and still, we observed no accuracy loss.

> two node with A100 80g using commands following: `python -m sglang.launch_server --model /home/wanglch/models/DeepSeek-R1-INT8 --tp 16 --dist-init-addr [192.168.33.121:3000](http://192.168.33.121:3000/) --nnodes 2 --node-rank 0 --trust-remote --enable-torch-compile --torch-compile-max-bs 8 python -m sglang.launch_server --model...

> In this w8a8, is the **a8** dynamic-per-token-per-group quantized, the same as the original fp8 quantization grandularity? @laixinn @brisker Yes, it is.

> hello: > > Question1: How do i convert INT8 safetensor to GGUF (int8) via llama.cpp which only default layout BF16 or FP8 GGUF? > > Question2: Can i use...

> i encountered the same OOM problem when transfer DeepSeek-R1-BF16 weight to 4bit using this script on A800(40g) machine > > ``` > from datasets import load_dataset > from gptqmodel...