llama.cpp
llama.cpp copied to clipboard
Tracking: LoRA
Here are some outstanding issues for LoRA:
- [x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820)
- [ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/ggerganov/llama.cpp/issues/956)
- [ ] Improve LoRA loading time with MMAP on base model
- [ ] quantizing an MMAPed float16 base model that has had LoRA applied
- [ ] Interpolation of weights (start with 1, look into multiple) (https://github.com/ggerganov/llama.cpp/issues/905)
- [ ] Export loaded model to binfile (standalone in CLI with LoRA (
--export-lora
flag); interactively (?)) (https://github.com/ggerganov/llama.cpp/issues/904) - [ ] Investigate extracting LoRA for arbitrary models (see PEFT issue)
really desperate to start uing LoRA, however I use GPTQ-4bit-32g.GGML will this be a problem?
So far, we've seen issues with quality on 4 bit base model. That being said, it has produced reasonable output for me some of the time. It is still under investigation.
Would this be a good place to request support for multiple lora adapters sharing a similar base model? See here for inspiration: https://github.com/lm-sys/FastChat/pull/1905
Improve LoRA loading time with MMAP on base model
was done here https://github.com/ggerganov/llama.cpp/pull/2095
also not sure this issue is the right one
This issue was closed because it has been inactive for 14 days since being marked as stale.