Open-Assistant
Open-Assistant copied to clipboard
Add LoRA and Prefix-Tuning as Modeling Options for Improved Memory Efficiency + performance (potentially)
This PR adds LoRA and prefix-tuning as modelling options (training and sampling code).
Both have shown strong performance and can outperform fine-tuning. They also can protect against the catastrophic forgetting problem which is important for chatbots. They keep the whole language model frozen so they can be distributed freely independent of the base language model.
They also allow much more memory-efficient training as there is no need for the optimizer states of the base model.
Benefits of LoRA
- Can run only DS 2 with 30B.
- Run 65B without intense CPU usage
- Less overfitting, maintains performance on datasets.
- Only need to share/push to hub the OS component (small file in Mbs)
— See Andrej Karpathy (OpenAI) comment — See purported google leak
Implementation Details:
- Explicitly set input_ids as a keyword argument in the sampling code to ensure proper functionality with PEFT
generate.
- Manually enable gradients for input to leverage gradient checkpointing effectively, as frozen embeddings would otherwise prevent gradient attachment.
- Include saving code from
pytorch_model.bin
for special tokens. Although these tokens are randomly initialized, they must be stored and saved as an external module since PEFT parameters learn to utilize them. Making them trainable is an option, but it is unlikely to make a significant difference. - Integrate prefix-tuning, a powerful technique that involves stacking keys and values, by incorporating custom LLama modelling code. (Training speed lower than LoRA in my initial tests and performance similar/worse).
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
@jordiclive would this PR make it possible to load a PEFT model for inference in the chat?
@smeyerhot This code is currently just for model training and evaluation. But should be trivial to load it for inference, it uses the same HF generate method.
@andreaskoepf I am going to run a 30B LoRA model just on the sft datasets and will post the sampling report.
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md