Does SmoothQuant support W8A8 (activation per tensor static quant) for DeepSeek-R1?
When I use the following configuration file:
base:
seed: &seed 42
model:
type: DeepseekV3
path: xxx
tokenizer_mode: fast
torch_dtype: torch.float8_e4m3fn
calib:
name: pileval
download: False
path: xxx
n_samples: 128
bs: 1
seq_len: 512
preproc: txt_general_preproc
seed: *seed
quant:
method: SmoothQuant
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_tensor
static: True
special:
alpha: 0.8
save:
save_vllm: True
save_path: xxx
An error occurred. BaseBlockwiseQuantization.update_input_feat() misssing 1 reqiured positional argument: 'is_gqa'
It doesn't look like an issue with the particular model, but rather a bug in the code - SmoothQuant code wasn't adjusted to modification in function update_input_feat().
In my understanding, V3 does not use GQA, so I would advice (as a patch) setting is_gqa=False in the function call from smoothquant.py.
Hope it helps.
[Note: I'm not an author in this repo, just an active user]
It doesn't look like an issue with the particular model, but rather a bug in the code - SmoothQuant code wasn't adjusted to modification in function
update_input_feat(). In my understanding, V3 does not use GQA, so I would advice (as a patch) settingis_gqa=Falsein the function call fromsmoothquant.py. Hope it helps.[Note: I'm not an author in this repo, just an active user]
ok, thanks