axolotl Improve Adapter/LoRA handling

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

there are a few use cases that aren't cleanly handled atm.

loading an existing adapter, then doing FFT over the merged model. We would need to load and merge the adapter and continue a regular FFT
loading an existing adapter and then training a new adapter over the merged model.

currently both of these can be worked around by simply manually merging the models beforehand, but it would be nice to handle these cases.

✔️ Solution

see above

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

Jan 11 '24 13:01 winglian

Additionally, we should simplify the lora/qlora w 8/4bit loading. Ideally we get rid of the adapter: qlora option as qlora is simply a specific subset of lora where all linear layers are targeted, and 4 bit quantization. I think we can simplify this to only lora and allow either 4 or 8bit to be set. And if a user selects qlora, then we warn about the specific cases where qlora applies - 4 bit and targeting all linear layers

Jan 11 '24 18:01 winglian

I'll give this a go - just so I understand the first part correctly:

I should be able to do (full) finetuning with an existing adapter model as base_model arg in the training config. In that case the base model of the adapter should be merged with the adapter (i.e. using merge_and_unload) and then full finetune should be ran on the merged model.
Optionally full finetune can be swapped for new lora/qlora training run, adding new lora-layers/adapters to the merged model and then proceeding to train the new adapter weights only.

Is that right?

Jan 14 '24 21:01 simhallq

So in the first case, a user coukd add a lora model dir arg, but have an empty adapter. The model should simply just merge and unload the adapter into the base model only.

I'm not clear on what you are asking for the second case.

Jan 14 '24 23:01 winglian

axolotl axolotl copied to clipboard

Improve Adapter/LoRA handling

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

axolotl
axolotl copied to clipboard