axolotl
axolotl copied to clipboard
Improve Adapter/LoRA handling
⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [X] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
there are a few use cases that aren't cleanly handled atm.
- loading an existing adapter, then doing FFT over the merged model. We would need to load and merge the adapter and continue a regular FFT
- loading an existing adapter and then training a new adapter over the merged model.
currently both of these can be worked around by simply manually merging the models beforehand, but it would be nice to handle these cases.
✔️ Solution
see above
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.
Additionally, we should simplify the lora/qlora w 8/4bit loading. Ideally we get rid of the adapter: qlora option as qlora is simply a specific subset of lora where all linear layers are targeted, and 4 bit quantization. I think we can simplify this to only lora and allow either 4 or 8bit to be set. And if a user selects qlora, then we warn about the specific cases where qlora applies - 4 bit and targeting all linear layers
I'll give this a go - just so I understand the first part correctly:
- I should be able to do (full) finetuning with an existing adapter model as
base_modelarg in the training config. In that case the base model of the adapter should be merged with the adapter (i.e. using merge_and_unload) and then full finetune should be ran on the merged model. - Optionally full finetune can be swapped for new lora/qlora training run, adding new lora-layers/adapters to the merged model and then proceeding to train the new adapter weights only.
Is that right?
So in the first case, a user coukd add a lora model dir arg, but have an empty adapter. The model should simply just merge and unload the adapter into the base model only.
I'm not clear on what you are asking for the second case.