alpaca-lora Finetuned Model Inference error: AttributeError: 'NoneType' object has no attribute 'device'

Update: for anyone experiencing this issue, see the workaround I posted in https://github.com/tloen/alpaca-lora/issues/14#issuecomment-1471263165

I tried out the finetune script locally and and it looks like there was no problem with that. However, when trying to run inference, I'm getting AttributeError: 'NoneType' object has no attribute 'device' from bitsandbytes. I've checked and looks like it was an issue related to model sharing on cpu and gpu, but I am not sure which part of this repo is causing that. Any idea?

Relevant issue in bitsandbytes: https://github.com/TimDettmers/bitsandbytes/issues/40

Mar 16 '23 02:03 0xbitches

Which version of bitsandbytes are you on

Mar 16 '23 02:03 devilismyfriend

@devilismyfriend
0.37.0 latest release

Mar 16 '23 03:03 0xbitches

I have the same issue. I only get it when I try to run inference with my local fine tune, the downloaded one doesn't have the problem. I am on the latest bits and bytes commit built from source.

Mar 16 '23 03:03 ItsLogic

Maybe try allocating the foundation model on the CPU? device_map={'': 'cpu'}

That might save some VRAM for the LoRA model.

Mar 16 '23 03:03 tloen

Changing device_map to cpu did not help for me, still getting the same stack trace. It looks like the downloaded model is using {'base_model': 0} device map, which only loads in GPU. Local finetune device map looks like:

{'base_model.model.model.embed_tokens': 0, 'base_model.model.model.layers.0': 0, 'base_model.model.model.layers.1': 0, 'base_model.model.model.layers.2': 0, 'base_model.model.model.layers.3': 0, 'base_model.model.model.layers.4': 0, 'base_model.model.model.layers.5': 0, 'base_model.model.model.layers.6': 0, 'base_model.model.model.layers.7': 0, 'base_model.model.model.layers.8': 0, 'base_model.model.model.layers.9': 0, 'base_model.model.model.layers.10': 0, 'base_model.model.model.layers.11': 0, 'base_model.model.model.layers.12': 0, 'base_model.model.model.layers.13': 0, 'base_model.model.model.layers.14': 0, 'base_model.model.model.layers.15': 0, 'base_model.model.model.layers.16': 0, 'base_model.model.model.layers.17': 0, 'base_model.model.model.layers.18': 0, 'base_model.model.model.layers.19': 0, 'base_model.model.model.layers.20': 0, 'base_model.model.model.layers.21': 0, 'base_model.model.model.layers.22': 0, 'base_model.model.model.layers.23': 0, 'base_model.model.model.layers.24': 0, 'base_model.model.model.layers.25': 0, 'base_model.model.model.layers.26': 0, 'base_model.model.model.layers.27': 'cpu', 'base_model.model.model.layers.28': 'cpu', 'base_model.model.model.layers.29': 'cpu', 'base_model.model.model.layers.30': 'cpu', 'base_model.model.model.layers.31': 'cpu', 'base_model.model.model.layers.32': 'cpu', 'base_model.model.model.layers.33': 'cpu', 'base_model.model.model.layers.34': 'cpu', 'base_model.model.model.layers.35': 'cpu', 'base_model.model.model.layers.36': 'cpu', 'base_model.model.model.layers.37': 'cpu', 'base_model.model.model.layers.38': 'cpu', 'base_model.model.model.layers.39': 'cpu', 'base_model.model.model.norm': 'cpu', 'base_model.model.lm_head': 'cpu'}

Mar 16 '23 03:03 0xbitches

@ItsLogic

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

Looks like the issue is that Peft's load will auto apply a device_map if not specified, which will load some of the model weights with cpu. This is unforunately not compatible with bitsandbytes. Forcing peft to use only the GPU is the workaround I found.

Mar 16 '23 03:03 0xbitches

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

This seems to work for me as well. Cheers now I can use my 13B lora

Mar 16 '23 04:03 ItsLogic

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

had the same problem with the stock generate.py, this fixed it for me as well. can confirm it works on a RTX 3060 with 12GB (9.9GB in use). but nvtop reports only 30% GPU usage. there's a bottleneck somewhere.

Also, uncommenting and executing the original test code failed with the last sample with an OOM error. using the gradio UI i get about 1GB of extra memory used after each request, so i'd say it's a leak. I added import gc; gc.collect() to generate and that seems to fix it, but long responses can also trigger OOM. Limiting tokens to 128 did help.

Mar 17 '23 14:03 paniq

So to clarify, the changes I had to apply was in generate.py:

model = PeftModel.from_pretrained(
        model, "tloen/alpaca-lora-7b",
        torch_dtype=torch.float16
    )

change this to:

    model = PeftModel.from_pretrained(
        model, "tloen/alpaca-lora-7b",
        torch_dtype=torch.float16,
        device_map={'': 0}
    )

Mar 18 '23 12:03 ThatCoffeeGuy

This may be fixed by this PEFT PR

Mar 23 '23 12:03 kooshi

This may be fixed by a recent PR on accelerate that supports weights quantization for dispatch_model function. Related PR: https://github.com/huggingface/accelerate/pull/1237 - can you try to use the main branch of accelerate by installing it from source?

pip install git+https://github.com/huggingface/accelerate

https://github.com/huggingface/peft/issues/115#issuecomment-1504411743

Apr 12 '23 08:04 younesbelkada

alpaca-lora alpaca-lora copied to clipboard

Finetuned Model Inference error: AttributeError: 'NoneType' object has no attribute 'device'

alpaca-lora
alpaca-lora copied to clipboard