FastChat
FastChat copied to clipboard
Add LoraAdapter to model_adapter.py
Add LoRA adapter, so that a finetuned adapter doesn't need to be merged to model every time, but instead be loaded directly.
Why are these changes needed?
Loading Adapters directly allows for many different finetuned versions with little storage space requirements and easier infrastructure management.
Checks
- I tested it with Vicuna7B-v1.1 and an adapter finetuned using alpaca-lora
- Improve by adding by allowing to pass arguments, such as
--load-8bit
to base model. - Add support for many adapters sharing single base model on same GPU for efficient VRAM usage => easily compare many finetuned versions
closed by https://github.com/lm-sys/FastChat/pull/1807