optimum
optimum copied to clipboard
GPTQQuantizer hard coded the device to 0
System Info
optimum version: 405199457d9f6b8e060c043216eb717bc7a4c4c1
Platform Linux:
Python version: 3.11
Who can help?
@SunMarc @younesbelkada @fxmarty
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
When trying to use device_map to load the model to a specific device. The quantizer raises the error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
import torch
torch.cuda.set_device(2)
quantization_config = GPTQConfig(
bits=4, group_size=128, dataset="wikitext2", desc_act=False, use_cuda_fp16=True
)
quant_model = AutoModelForCausalLM.from_pretrained(
"facebook/opt-125m", quantization_config=quantization_config, device_map={'':torch.cuda.current_device()}
)
The root cause is the at https://github.com/huggingface/optimum/blob/405199457d9f6b8e060c043216eb717bc7a4c4c1/optimum/gptq/quantizer.py#L429C21-L429C38
if self.cache_block_outputs:
handle = blocks[0].register_forward_pre_hook(store_input_hook, with_kwargs=True)
for data in dataset:
for k, v in data.items():
# put the data on gpu, we won't put them back to cpu
data[k] = v.to(0)
try:
model(**data)
except ValueError:
pass
handle.remove()
Expected behavior
The input should be transferred to the same device as the model.
cc @SunMarc
same problem here