peft Error training GPT-2 with LoRA INT8

Hi! I'm trying to fine-tune a GPT2-based model with LORA in INT8 with bits_and_bytes. I followed the steps in the example notebook provided here: https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing

The following code (taken from the linked notebook) works fine for the OPT model, but now when I try to run it with GPT2 it gives me an error.

The code:

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0" 
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model,  prepare_model_for_int8_training 
import transformers
from datasets import load_dataset


model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
# might not be optimal, just trying to run the code
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    load_in_8bit=True, 
    device_map='auto',
)



config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

model = prepare_model_for_int8_training(model)

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples['quote']), batched=True)

trainer = transformers.Trainer(
    model=model, 
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=100, 
        max_steps=200, 
        learning_rate=2e-4, 
        fp16=True,
        logging_steps=1, 
        output_dir='outputs',
        auto_find_batch_size=True,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

trainer.train()

But I get the following error

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py](https://i0piia5q1se-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20230221-060110-RC00_511164842#) in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

I understand that the tensors need to be on the same device to perform the computations, but I am not sure how to achieve this when the device_map='auto'. Or maybe I am missing something else here. Any pointers would be greatly appreciated!

Feb 24 '23 16:02 holmad

try device_map={'':torch.cuda.current_device()} ? https://github.com/huggingface/transformers/issues/21736

Feb 27 '23 08:02 chenmingjiong

Thank you! That resolved the issue with the devices.

However, now the following error appears:

[/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py]) in forward(self, x)
    446             return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    447         else:
--> 448             result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    449             if self.r > 0:
    450                 after_A = self.lora_A(self.lora_dropout(x))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (152x768 and 2304x768)

I ran the same code as before but I just added the device map as suggested.

Feb 27 '23 15:02 holmad

Thank you! That resolved the issue with the devices.

However, now the following error appears:

[/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py]) in forward(self, x)
    446             return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    447         else:
--> 448             result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    449             if self.r > 0:
    450                 after_A = self.lora_A(self.lora_dropout(x))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (152x768 and 2304x768)

I ran the same code as before but I just added the device map as suggested.

I'm seeing the same issue were you ever able to resolve it?

Mar 12 '23 20:03 epinnock

Thank you! That resolved the issue with the devices. However, now the following error appears:
[/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py]) in forward(self, x)
    446             return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    447         else:
--> 448             result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    449             if self.r > 0:
    450                 after_A = self.lora_A(self.lora_dropout(x))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (152x768 and 2304x768)
I ran the same code as before but I just added the device map as suggested.
I'm seeing the same issue were you ever able to resolve it?

hello, lora with load_in_8bit=True does not work for me, but it works without it. I'm guessing the problem might be related to loading models in 8bit.

Mar 23 '23 12:03 drimeF0

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 16 '23 15:04 github-actions[bot]

Training for this seem to work for me.

Apr 18 '23 20:04 Oxi84

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 13 '23 15:05 github-actions[bot]

peft peft copied to clipboard

Error training GPT-2 with LoRA INT8

peft
peft copied to clipboard