Unpinning transformers or upgrading its latest supported version
The latest version of autoawq pins the transformers package (its dependency) thus: transformers<=4.47.1, i.e. to an older version 4.47.1 from 17th December 2024, that among other models did not support Qwen2.5-VL-72B-Instruct) (transformers==4.49.0 or newer is required for this model).
Could you kindly upgrade the latest supported version or unpin transformers in autoawq requirements (after tests or even without them to let users do the testing)?
The current method of pinning transformers used by autoawq blocks users from running models supported by transformers released over the past 2 months, even if they do not use AWQ quantizations.
Some illustration of the problem and a possible workaround - forcing an upgrade of transformers (caution: here without any tests or guarantees):
# using the newest version of `transformers` supported by `autoawq` leads to a failure of importing Qwen2.5 VL classes:
pip install autoawq --user
[..]
pip install transformers --user
[..]
python -c "from transformers import Qwen2_5_VLForConditionalGeneration"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: cannot import name 'Qwen2_5_VLForConditionalGeneration' from 'transformers' (/opt/conda/lib/python3.11/site-packages/transformers/__init__.py)
# versus:
# using the latest version of `transformers` (yet unsupported by `autoawq`) fixes the import failure:
pip install autoawq --user
[..]
pip install transformers==4.49.0 --user
Collecting transformers==4.49.0
[..]
# Installing collected packages: transformers
[..]
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autoawq 0.2.8 requires transformers<=4.47.1,>=4.45.0, but you have transformers 4.49.0 which is incompatible.
Successfully installed transformers-4.49.0
# imports correctly, but would it work with AutoAWQ when using model versions quantized with AWQ?
python -c "from transformers import Qwen2_5_VLForConditionalGeneration; print(Qwen2_5_VLForConditionalGeneration.__name__)"
Qwen2_5_VLForConditionalGeneration
And one of the reasons why the latest autoawq==0.2.8 requires transformers<=4.47.1 from December 2024 is this incompatibility issue reported here: https://github.com/casper-hansen/AutoAWQ/issues/731
Hello,
I would add that this makes autoawq incompatible with recent versions of vllm: https://github.com/vllm-project/vllm/blob/main/requirements/common.txt#L9 🤯
In case it can help I still managed to quantize a Mistral model with the below class.
The first error I mention comes from my use of AutoAWQForCausalLM.from_pretrained(..., device_map="cpu") to fit in a small GPU.
The second error is the one mentioned in #731
class CustomAwqQuantizer(AwqQuantizer):
"""Custom implementation of AutoAWQ Quantizer"""
def init_quant(self, n_samples=128, max_seq_len=512):
"""This fixes two identified issues with the latest autoawq implementation
and the latest transformers updates.
Firstly, the initialization moves only the 1st submodule to CUDA, which does
not move all buffers & parameters linked to the main module, causing an error
`Expected all tensors to be on the same device`. So we force this to be CPU.
Secondly, a recent refactoring of transformers attention implementation made
mandatory to pass `position_embeddings` and `attention_mask` to the modules,
even if they are set to None.
"""
with make_cuda_unavailable():
modules, layer_kwargs, inps = super().init_quant(n_samples, max_seq_len)
for key in ("position_embeddings", "attention_mask"):
# These optional arguments are now required by the transformers model
# even if they are set to None
layer_kwargs[key] = layer_kwargs.get(key)
return modules, layer_kwargs, inps
If you think this fix is heading for the right direction I can submit a PR with these changes
Bumping for priority. This makes no sense to keep in 2025- the latest versions of VLLM are AWESOME.