llama.cpp Tracking: LoRA

Here are some outstanding issues for LoRA:

[x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820)
[ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/ggerganov/llama.cpp/issues/956)
[ ] Improve LoRA loading time with MMAP on base model
- [ ] quantizing an MMAPed float16 base model that has had LoRA applied
[ ] Interpolation of weights (start with 1, look into multiple) (https://github.com/ggerganov/llama.cpp/issues/905)
[ ] Export loaded model to binfile (standalone in CLI with LoRA (--export-lora flag); interactively (?)) (https://github.com/ggerganov/llama.cpp/issues/904)
[ ] Investigate extracting LoRA for arbitrary models (see PEFT issue)

Apr 14 '23 09:04 jon-chuang

really desperate to start uing LoRA, however I use GPTQ-4bit-32g.GGML will this be a problem?

Apr 15 '23 17:04 captainzero93

So far, we've seen issues with quality on 4 bit base model. That being said, it has produced reasonable output for me some of the time. It is still under investigation.

Apr 15 '23 17:04 jon-chuang

Would this be a good place to request support for multiple lora adapters sharing a similar base model? See here for inspiration: https://github.com/lm-sys/FastChat/pull/1905

Aug 03 '23 22:08 bmanturner

Improve LoRA loading time with MMAP on base model

was done here https://github.com/ggerganov/llama.cpp/pull/2095

also not sure this issue is the right one

Aug 04 '23 09:08 Green-Sky

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 09 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Tracking: LoRA

llama.cpp
llama.cpp copied to clipboard