optimum icon indicating copy to clipboard operation
optimum copied to clipboard

GPTQQuantizer hard coded the device to 0

Open cctry opened this issue 1 year ago • 2 comments

System Info

optimum version: 405199457d9f6b8e060c043216eb717bc7a4c4c1
Platform Linux:
Python version: 3.11

Who can help?

@SunMarc @younesbelkada @fxmarty

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

When trying to use device_map to load the model to a specific device. The quantizer raises the error

RuntimeError: Expected all tensors to be on the same device, but found at least two devices

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
import torch
torch.cuda.set_device(2)
quantization_config = GPTQConfig(
        bits=4, group_size=128, dataset="wikitext2", desc_act=False, use_cuda_fp16=True
    )
quant_model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-125m", quantization_config=quantization_config, device_map={'':torch.cuda.current_device()}
)   

The root cause is the at https://github.com/huggingface/optimum/blob/405199457d9f6b8e060c043216eb717bc7a4c4c1/optimum/gptq/quantizer.py#L429C21-L429C38

if self.cache_block_outputs:
     handle = blocks[0].register_forward_pre_hook(store_input_hook, with_kwargs=True)
      for data in dataset:
          for k, v in data.items():
              # put the data on gpu, we won't put them back to cpu
              data[k] = v.to(0)
          try:
              model(**data)
           except ValueError:
              pass
      handle.remove()

Expected behavior

The input should be transferred to the same device as the model.

cctry avatar Jan 24 '24 19:01 cctry

cc @SunMarc

fxmarty avatar Jan 26 '24 08:01 fxmarty

same problem here

anhnh2002 avatar Jan 30 '24 07:01 anhnh2002