llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Tracking: LoRA

Open jon-chuang opened this issue 1 year ago • 2 comments

Here are some outstanding issues for LoRA:

  • [x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820)
  • [ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/ggerganov/llama.cpp/issues/956)
  • [ ] Improve LoRA loading time with MMAP on base model
    • [ ] quantizing an MMAPed float16 base model that has had LoRA applied
  • [ ] Interpolation of weights (start with 1, look into multiple) (https://github.com/ggerganov/llama.cpp/issues/905)
  • [ ] Export loaded model to binfile (standalone in CLI with LoRA (--export-lora flag); interactively (?)) (https://github.com/ggerganov/llama.cpp/issues/904)
  • [ ] Investigate extracting LoRA for arbitrary models (see PEFT issue)

jon-chuang avatar Apr 14 '23 09:04 jon-chuang

really desperate to start uing LoRA, however I use GPTQ-4bit-32g.GGML will this be a problem?

captainzero93 avatar Apr 15 '23 17:04 captainzero93

So far, we've seen issues with quality on 4 bit base model. That being said, it has produced reasonable output for me some of the time. It is still under investigation.

jon-chuang avatar Apr 15 '23 17:04 jon-chuang

Would this be a good place to request support for multiple lora adapters sharing a similar base model? See here for inspiration: https://github.com/lm-sys/FastChat/pull/1905

bmanturner avatar Aug 03 '23 22:08 bmanturner

Improve LoRA loading time with MMAP on base model

was done here https://github.com/ggerganov/llama.cpp/pull/2095

also not sure this issue is the right one

Green-Sky avatar Aug 04 '23 09:08 Green-Sky

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 09 '24 01:04 github-actions[bot]