AutoAWQ
AutoAWQ copied to clipboard
Error with `transformers` 4.48+
In transformers 4.48, they made attention_mask a required positional argument, so this is the error you get now:
Traceback (most recent call last):
File "/home/ubuntu/lambda-quant-1/quantize.py", line 211, in <module>
main()
File "/home/ubuntu/lambda-quant-1/quantize.py", line 91, in main
model.quantize(
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/awq/models/base.py", line 241, in quantize
self.quantizer.quantize()
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 179, in quantize
scales_list = [
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 180, in <listcomp>
self._search_best_scale(self.modules[i], **layer)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 340, in _search_best_scale
fp16_output = self._module_forward(inp, module2inspect, module_kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 260, in _module_forward
module_output = module(x, **module_kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/lambda-quant-1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
TypeError: LlamaAttention.forward() missing 1 required positional argument: 'attention_mask'
Code I'm using to produce this:
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_pretrained(
"meta-llama/Llama-3.3-70B-Instruct", low_cpu_mem_usage=True, use_cache=False
)
ds = ds.map(preprocess, remove_columns=ds.column_names)
ds = [q["text"] for q in ds]
model.quantize(
tokenizer,
calib_data=ds,
max_calib_samples=args.num_samples,
max_calib_seq_len=args.seq_length,
)
use transformers==4.47.1
Yeah that's what I ended up doing. Figured I'd report so it can be used with newer version