llama.cpp Can we finetune existing models via the ideas of QLORA

Can we finetune existing models via the ideas of QLORA

Open howard0su opened this issue 1 year ago • 4 comments

QLORA gives a idea that we can still use quantized weights and LoRA to fine tune a model. As backward caculation is most done already, maybe we can look at this:

Evaluate if we need to do double quantize to further optimize for VRAM usage over speed.
implement LoRA finetune in llama? or a standalone application?
Add gpu offload support to compute grad.

Jun 19 '23 13:06 howard0su

I would also like to see this.

In my company we often have confidential data under export control. So the cloud is not an option.

Getting a server with a decent GPU into a data centre close to the data could take weeks or just outright be blocked.

So GGML with QLoRa running on our dev machines would give us the opportunity to build proofs of concepts.

Jun 20 '23 07:06 9876691

I'm thinking we should demonstrate full parameter training as well as fine tuning on Mac Studio with M2 Ultra. It offers a lot of unified RAM (up to 192 GB) and now with Metal support in ggml, we should get reasonable performance

Jun 20 '23 07:06 ggerganov

I am reading this paper: https://arxiv.org/abs/2306.09782 Full Parameter Fine-tuning for Large Language Models with Limited Resources

It proposed a new optimizer which is saving lot compared to SGD/ADAM.

Jun 20 '23 11:06 howard0su

I've got a Mac Studio with M2 Ultra and 192 GB as of yesterday and I'm very interested in this topic.

The debate between fine tuning on internal documents vs searching in a vector database and feeding docs through to a prompt is a key question on several projects for us, and having a feasible way to fine-tune on this machine for testing would be amazing.

Let me know how I can be helpful, I'm happy to run test workloads against this

Jun 20 '23 15:06 ericskiff

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 10 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Can we finetune existing models via the ideas of QLORA

llama.cpp
llama.cpp copied to clipboard