axolotl
axolotl copied to clipboard
LongLora suport
⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [X] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
Can you implement this new LoRA method? That would be great to have 32k+ LoRA models. Looks promising.
✔️ Solution
https://github.com/dvlab-research/LongLoRA
http://arxiv.org/abs/2309.12307
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.
Seems like they provided a patch for llama in their repo.
Parts I've noticed:
- during merge: need to load this
trainable_params = os.path.join(args.peft_model, "trainable_params.bin")
if os.path.isfile(trainable_params):
model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False)
for these layers "embed,norm"
https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/merge_lora_weights_and_save_hf_model.py#L98-L100
-
patch has a FA and non-FA version https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/llama_attn_replace.py#L336
-
depend on rope_scaling also for base model https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/fine-tune.py#L110-L113
May want to keep track of https://github.com/huggingface/peft/issues/958 in case it is supported there.
looking at the shift/unshift code, it seems it's not packed sequence length aware, so that would need some modification (or simply not allow packed sequences to work w this features)
Is this something that is on the roadmap?