Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:13<00:00, 3.28s/it]
Repo card metadata block was not found. Setting CardData to empty.
Token indices sequence length is longer than the specified maximum sequence length for this model (132274 > 16384). Running this sequence through the model will result in indexing errors
AWQ: 0%| | 0/27 [00:05<?, ?it/s]
Traceback (most recent call last):
File "/testspace/repo/deepseek/AutoAWQ/tests/deepseek_quantize.py", line 33, in
model.quantize(tokenizer, quant_config=quant_config, calib_data=load_wikitext())
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/testspace/repo/deepseek/AutoAWQ/awq/models/base.py", line 232, in quantize
self.quantizer.quantize()
File "/testspace/repo/deepseek/AutoAWQ/awq/quantize/quantizer.py", line 166, in quantize
scales_list = [
File "/testspace/repo/deepseek/AutoAWQ/awq/quantize/quantizer.py", line 167, in
self._search_best_scale(self.modules[i], **layer)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/testspace/repo/deepseek/AutoAWQ/awq/quantize/quantizer.py", line 330, in _search_best_scale
best_scales = self._compute_best_scale(
File "/testspace/repo/deepseek/AutoAWQ/awq/quantize/quantizer.py", line 391, in _compute_best_scale
self.pseudo_quantize_tensor(fc.weight.data)[0] / scales_view
File "/testspace/repo/deepseek/AutoAWQ/awq/quantize/quantizer.py", line 76, in pseudo_quantize_tensor
assert org_w_shape[-1] % self.group_size == 0
AssertionError
seconding this Issue with autoawq-0.2.7.post3