peft Example Finetune_opt_bnb_peft.ipynb runs in colab notebook but fails on my system with seemingly the same requirements installed: "expected scalar type Half but found Float"

When I run this code: https://github.com/huggingface/peft/blob/main/examples/int8_training/Finetune_opt_bnb_peft.ipynb by copying and pasting it into /home/jahangmar/peft_finetune_opt_bnb.py and then executing it, I get the following output:

/home/jahangmar/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
2023-03-25 06:38:23.216670: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Found cached dataset json (/home/jahangmar/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/jahangmar/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so...
trainable params: 589824 || all params: 125829120 || trainable%: 0.46875
^M  0%|          | 0/1 [00:00<?, ?it/s]^M100%|██████████| 1/1 [00:00<00:00, 1082.40it/s]
Loading cached processed dataset at /home/jahangmar/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bbb01537a3e5298d.arrow
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
/home/jahangmar/.local/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
^M  0%|          | 0/200 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/home/jahangmar/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/home/jahangmar/peft_finetune_opt_bnb.py", line 80, in <module>
    trainer.train()
  File "/home/jahangmar/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
    return inner_training_loop(
  File "/home/jahangmar/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1911, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/jahangmar/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2667, in training_step
    self.scaler.scale(loss).backward()
  File "/usr/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/usr/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/home/jahangmar/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
    grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float
^M  0%|          | 0/200 [00:00<?, ?it/s]

I have the following libraries installed:

pip list | grep -e transformers -e accelerate -e peft -e bitsandbytes -e datasets -e loralib -e 'torch '
accelerate               0.17.1
bitsandbytes             0.37.2
datasets                 2.10.1
lion-pytorch             0.0.7
loralib                  0.1.1
open-clip-torch          2.7.0
peft                     0.3.0.dev0
torch                    2.0.0rc5
transformers             4.28.0.dev0

pacman -Qs cuda
local/cuda 12.1.0-1
    NVIDIA's GPU programming toolkit
local/cuda-tools 11.8.0-1
    NVIDIA's GPU programming toolkit (extra tools: nvvp, nsight)
local/cudnn 8.6.0.163-1
    NVIDIA CUDA Deep Neural Network library
local/icu 72.1-2
    International Components for Unicode library
local/lib32-icu 72.1-2
    International Components for Unicode library (32 bit)
local/openmpi 4.1.4-4
    High performance message passing library (MPI)
local/python-cuda 12.0.0-1
    Python interface for CUDA provided by NVIDIA.
local/python-pycuda 2022.1-3
    Python wrapper for Nvidia CUDA
local/python-pytorch-opt-cuda 2.0.0rc5-2
    Tensors and Dynamic neural networks in Python with strong GPU acceleration (with CUDA and AVX2 CPU optimizations)
local/python-tensorflow-cuda 2.12.0rc1-1
    Library for computation using data flow graphs for scalable machine learning (with CUDA)
local/python-torchvision-cuda 0.14.1-1
    Datasets, transforms, and models specific to computer vision (with GPU support)
local/tensorflow-opt-cuda 2.12.0rc1-1
    Library for computation using data flow graphs for scalable machine learning (with CUDA and AVX2 CPU optimizations)

In the README there is a google colab notebook linked that contains very similar code to https://github.com/huggingface/peft/blob/main/examples/int8_training/Finetune_opt_bnb_peft.ipynb and it runs fine but I get the same error when I run the code on my system. I tried to mimic the environment in the google colab notebook by installing the same torch version (1.13.1, in a conda environment) but this did not change the result. I also tried to downgrade the transformers library and peft.

Mar 25 '23 06:03 jahangmar

Even I am facing same issue

Mar 25 '23 12:03 Krishnateja244

Did anyone manage to fix it?

Mar 28 '23 18:03 macabdul9

any update on this issue? I also face the same problem

Mar 30 '23 02:03 felixstander

Have also seen this issue when loading trained pretrained lora checkpoints. Model works fine after being trained, and then when using PeftModel.from_pretrained - the model is throwing a similar error

Mar 30 '23 03:03 eware-godaddy

Screenshot 2023-03-30 at 10 55 19 AM

The notebook is working fine locally too. I'm unable to reproduce the issue. @eware-godaddy, could you please share a minimal reproducible example for us to deep dive into the issue?

Could everyone facing the issue try installing the main branch and see if that resolves the issue?

Mar 30 '23 05:03 pacman100

I have tried different versions of peft and transformers with no success. I was able to run the code by removing the two following lines: fp16=True, and model = prepare_model_for_int8_training(model) Alternatively, I can load the model without load_in_8bit=True and then the fp16=True flag works. The following code where the parameters are manually casted is also working as long as I don't cast those with ndim == 1 to float and don't use fp16 training.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
model_path='facebook/opt-125m'
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_8bit=True, torch_dtype=torch.float16, device_map={"": 0})
tokenizer = AutoTokenizer.from_pretrained(model_path, device_map={"": 0})

for param in model.parameters(): 
    param.requires_grad = False 
#    if param.ndim == 1: 
#        param.data = param.data.to(torch.float32) #leads to "RuntimeError: expected scalar type Half but found Float"
model.gradient_checkpointing_enable()
model.enable_input_require_grads()
class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

from peft import LoraConfig, get_peft_model
config = LoraConfig(
    r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)

import transformers
from datasets import load_dataset
data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=1,
        warmup_steps=100,
        max_steps=200,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir='outputs',
#        fp16=True, #leads to "RuntimeError: expected scalar type Half but found Float"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()

I marked the two lines that lead to this error with a comment. This seems to be more of an issue with the transfomers.Trainer class rather than peft itself because the error also occurs when I don't use peft.

Mar 30 '23 11:03 jahangmar

No success even with the above. I am getting this issue only on V100-32G. Everything works fine on A100-40G. Seems like issue with bnb.

Apr 01 '23 22:04 macabdul9

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 26 '23 15:04 github-actions[bot]

I am having a similar issue,

May 07 '23 19:05 davidberenstein1957

I am having the same issue

May 20 '23 06:05 psych0v0yager

I am having the same issue too. it works on A100 but fail on V100 . Any idea how to fix it ?

Sep 08 '23 09:09 withoutaword

@pacman100 hi bro, would you pls check and fix this issue, thx in advance.

Sep 08 '23 09:09 withoutaword

peft peft copied to clipboard

Example Finetune_opt_bnb_peft.ipynb runs in colab notebook but fails on my system with seemingly the same requirements installed: "expected scalar type Half but found Float"

peft
peft copied to clipboard