alpaca.cpp
alpaca.cpp copied to clipboard
What will it take to get a 65B alpaca weight?
This was initially released with 7B and 13B alpaca weights. I added instructions how to use a 30B alpaca weight yesterday since it appeared here https://huggingface.co/Pi3141/alpaca-30B-ggml. I know there is a 65B llama weight, but as far as I understand, not a 65B alpaca weight yet.
What will it take to get a 65B alpaca weight and how can we get this done as a community?
It will really just take a lot of money. We can finetune on the same datasets the others have been finetuned on in order to get a 65B model, but we will need to do using state of the art hardware.
do we know if the person who made the 30B alpaca is working on the 65B file? would love to pitch in a few dollars!
It will really just take a lot of money. We can finetune on the same datasets the others have been finetuned on in order to get a 65B model, but we will need to do using state of the art hardware.
I’ve heard of people renting the computational power from Google to train models/datasets so that could be an option. It’s relatively affordable apparently
Any knowledge of open initiatives doing this? Also would love to pitch in.
https://huggingface.co/chavinlo/Alpaca-65B/tree/main
The founder of Zapier has offered to fund this: https://twitter.com/mikeknoop/status/1638248244911435776
@Green-Sky newbie question: how come the file sizes are so small?
@d33tah The finetune does not contain the full model. to cite LoRA, the used technique: LoRA reduces the number of trainable parameters by learning pairs of rank-decompostion matrices while freezing the original weights. This vastly reduces the storage requirement for large language models adapted to specific tasks and enables efficient task-switching during deployment all without introducing inference latency. LoRA also outperforms several other adaptation methods including adapter, prefix-tuning, and fine-tuning.
Amen! How long would a 4x RTX A6000 (48GB each) server need for the fine-tuning (assuming it's up to the task)
Y'all: This shouldn't be difficult. I finetuned the 30B 8-bit Llama with Alpaca Lora in about 26 hours on a couple of 3090's with good results. The 65B model quantized in 4-bit has a memory footprint roughly the same as 30B in 8-bit. It looks like the Alpaca 65B weights are available on HF here: https://huggingface.co/chavinlo/Alpaca-65B. I haven't been able to fine-tune the 65B-4bit across multiple GPU's yet due to issues with training 4-bit models but it's certainly looking feasible and I don't see why it couldn't be done on 2x3090s with NVlink.
Y'all: This shouldn't be difficult. I finetuned the 30B 8-bit Llama with Alpaca Lora in about 26 hours on a couple of 3090's with good results. The 65B model quantized in 4-bit has a memory footprint roughly the same as 30B in 8-bit. It looks like the Alpaca 65B weights are available on HF here: https://huggingface.co/chavinlo/Alpaca-65B. I haven't been able to fine-tune the 65B-4bit across multiple GPU's yet due to issues with training 4-bit models but it's certainly looking feasible and I don't see why it couldn't be done on 2x3090s with NVlink.
Awesome, thanks for the link. Will this work with either of the alpaca.cpp or llama.cpp projects?
I thought training and fine-tuning is primarily done with FP16? What are the drawbacks of training in 4 bit?
Y'all: This shouldn't be difficult. I finetuned the 30B 8-bit Llama with Alpaca Lora in about 26 hours on a couple of 3090's with good results. The 65B model quantized in 4-bit has a memory footprint roughly the same as 30B in 8-bit. It looks like the Alpaca 65B weights are available on HF here: https://huggingface.co/chavinlo/Alpaca-65B. I haven't been able to fine-tune the 65B-4bit across multiple GPU's yet due to issues with training 4-bit models but it's certainly looking feasible and I don't see why it couldn't be done on 2x3090s with NVlink.
I'll have the motherboard I need tomorrow to set up my 2 3090ti's and nvlink adapter properly. I'd be willing to let it churn on the task for a few days - I'd love to have a 4 bit 65b alpaca model to run on my setup. Any advice on training over nvlink? I'm a little new to the llm stuff.
@RandyHaylor 4-bit lora training is currently only in this repo https://github.com/johnsmith0031/alpaca_lora_4bit afaik
I'm interested in doing this myself, too. Will have to monitor the temperatures of the 3090s closely…
@RandyHaylor 4-bit lora training is currently only in this repo https://github.com/johnsmith0031/alpaca_lora_4bit afaik
I'm interested in doing this myself, too. Will have to monitor the temperatures of the 3090s closely…
Thanks!
Based on an article I read about 3090ti's only losing 17% processing power when underpowered by 33% (300w vs 450w), I'll probably run them like that for any long tasks. I'm not in such a hurry that I'll risk burning these cards out...
@RandyHaylor 4-bit lora training is currently only in this repo https://github.com/johnsmith0031/alpaca_lora_4bit afaik
I'm interested in doing this myself, too. Will have to monitor the temperatures of the 3090s closely…
If I'm going to bother, what's the best 65b model to start from? Is there a trustworthy 4bit one to pull from?
I'm currently working on getting a 65b llama 4bit model even running on my 3090 ti's (Ubuntu desktop 22.04 bare metal install) I'm suspecting I might be having issues with the two cards not going in the two main gpu slots (risers arriving later today will fix that plus let me connect nvlink adapter)
Any advice on how to get it going? I've had good luck with text-generation-webui running 30b models on one card so far.
I have an X570 based board with the two GPU slots 60mm apart (3 slots) and their bandwidth is PCIe 4.0 x8 each when using both slots. I managed to find a 60mm NVLink adapter that didn't cost an arm and a leg. Inference with text-generation-webui works with 65b-4bit and two x090 24GB nvidia cards. Just give it the gpu memory parameter and assign less memory to the first GPU: --gpu-memory 16 21