alpaca-lora Successfully run training in 4bit mode, while the training speed is very slow

Successfully run training in 4bit mode, while the training speed is very slow

Open johnsmith0031 opened this issue 1 year ago • 13 comments

Here's some code needed for this adjustments. https://github.com/johnsmith0031/alpaca_lora_4bit Don't know why the training is so slow.

Mar 18 '23 05:03 johnsmith0031

How slow compared to 8-bit?

Mar 18 '23 07:03 gururise

I'm not too surprised, there aren't any good pytorch libraries for doing int4 on the tensor cores

Mar 18 '23 07:03 tloen

How slow compared to 8-bit?

Had not tried 8-bit yet, but apparently could not complete 3 epoch on instruction dataset in 5 hours.

Mar 18 '23 07:03 johnsmith0031

But why finetune on 4bit? afaik it's ok for inference but extremely bad for training

Maybe we need a way to covert the lora to 4bit instead?

Mar 18 '23 23:03 devilismyfriend

But why finetune on 4bit? afaik it's ok for inference but extremely bad for training

Because in 4-bit you can probably fine-tune the 30b model on a single 4090.

Mar 19 '23 03:03 gururise

But why finetune on 4bit? afaik it's ok for inference but extremely bad for training

Because in 4-bit you can probably fine-tune the 30b model on a single 4090.

Doesn't matter if the quality isn't good.

Mar 21 '23 00:03 devilismyfriend

Re-inplement the 4bit matmul and increased the training speed by about 20 times.

Mar 21 '23 09:03 johnsmith0031

You're a legend. I got this running when you first posted it. Tomorrow I'm going to try to train 65b with this plus #131

Mar 23 '23 05:03 kooshi

You're a legend. I got this running when you first posted it. Tomorrow I'm going to try to train 65b with this plus #131

@kooshi If you are training on the alpaca dataset, try using the cleaned dataset and let us know if you get better results.

Mar 23 '23 06:03 gururise

Optimized VRAM usage and now can train LoRA with 30b model on a single 4090

Mar 23 '23 09:03 johnsmith0031

Just wanted to post I was able to train on 30B with a 4090 as well using your code, johnsmith0031. Thanks for the effort!

Mar 31 '23 20:03 SpaceCowboy850

The quality of the model obtained (30B 4 bits) was good after training on 4 bits? (I am interested because I also own a 4090)

Apr 14 '23 03:04 fernando-neto-ai

We also try to implement 4bit-qlora, thanks to the optimized kernel implementation of back-propagation, the fine-tuning speed is similar to 8-bit lora at present. Welcome to use and issue: https://github.com/megvii-research/Sparsebit/tree/main/large_language_models/alpaca-qlora

Apr 16 '23 05:04 PeiqinSun

Really that bad ?!? I have spent so much time and money on 4-bits last few days, it is not good ...!

Jun 15 '23 18:06 thusinh1969

alpaca-lora alpaca-lora copied to clipboard

Successfully run training in 4bit mode, while the training speed is very slow

alpaca-lora
alpaca-lora copied to clipboard