axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

LongLora suport

Open generalsvr opened this issue 2 years ago • 5 comments

⚠️ Please check that this feature request hasn't been suggested before.

  • [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
  • [X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Can you implement this new LoRA method? That would be great to have 32k+ LoRA models. Looks promising.

✔️ Solution

https://github.com/dvlab-research/LongLoRA

http://arxiv.org/abs/2309.12307

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

generalsvr avatar Sep 22 '23 12:09 generalsvr

Seems like they provided a patch for llama in their repo.

Parts I've noticed:

  • during merge: need to load this
    trainable_params = os.path.join(args.peft_model, "trainable_params.bin")
    if os.path.isfile(trainable_params):
        model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False)

for these layers "embed,norm" https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/merge_lora_weights_and_save_hf_model.py#L98-L100

  • patch has a FA and non-FA version https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/llama_attn_replace.py#L336

  • depend on rope_scaling also for base model https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/fine-tune.py#L110-L113

NanoCode012 avatar Sep 23 '23 03:09 NanoCode012

May want to keep track of https://github.com/huggingface/peft/issues/958 in case it is supported there.

winglian avatar Sep 25 '23 15:09 winglian

looking at the shift/unshift code, it seems it's not packed sequence length aware, so that would need some modification (or simply not allow packed sequences to work w this features)

winglian avatar Sep 25 '23 15:09 winglian

Is this something that is on the roadmap?

DhruvaBansal00 avatar Feb 22 '24 19:02 DhruvaBansal00