transformers
transformers copied to clipboard
add open-llama model with ckpt
This PR adds a new model called Open-Llama, which is based on Llama's implementation in Transformers. In Open-Llama, emory-efficient attention has been added, resulting in a 30% improvement in training efficiency. Additionally, hidden dropout and attention dropout have been added for better generalization during training.
We have also added two optional features: stable embedding from Bloom and shared input-output vectors from PALM, which have been tested and found to improve training stability and performance.
The following code snippet shows the implementation of memory-efficient attention:
try:
from xformers import ops as xops
except ImportError:
xops = None
print("xformers is not installed correctly.")
if self.config.use_memorry_efficient_attention and xops is not None and self.training:
attn_weights = None
query_states = query_states.transpose(1, 2)
key_states = key_states.transpose(1, 2)
value_states = value_states.transpose(1, 2)
attn_output = xops.memory_efficient_attention(
query_states, key_states, value_states, attn_bias=xops.LowerTriangularMask(), p=self.dropout_prob
)
At the same time, for maximum compatibility, we have made xformers an optional dependency so that the original implementation can still be used for training and inference if it is not installed.
We implemented pre-training of the Llama model based on transformers + accelerate, incorporating the modifications described above. Open-Llama
The pre-trained model has already been open-sourced on s-JoL/Open-Llama-V1.
ref: https://github.com/huggingface/transformers/pull/22386
cc: @sgugger
The documentation is not available anymore as the PR was closed or merged.
cc @ArthurZucker and @younesbelkada
Please help me review this pull request. @ArthurZucker @younesbelkada
Hey! Thanks will review now
Thanks a lot for your contribution!
Thanks a lot for your contribution!
Hello, I have a question, why the open-Llama model cannot be searched in the documentation of transformers? Is there something I forgot to add?
Hi @s-JoL, thanks for notifying.
There was an issue in the doc rendering (resolved with 1, 2) leading to some pages not being retrievable in search. Should be working now!
@s-JoL I noticed that the links pertaining to Open-LLaMA are currently leading to 404 errors. Could you please provide some information on what might have happened?
@s-JoL Hi, I can't find a Open-LLaMA checkpoint and I noticed you delete your original repo. What happend? How Can I have a try of Open-LLaMA?
@heya5 Possibly due to some controversies surrounding this project, the original author has closed the original project. https://github.com/chenfeng357/open-Chinese-ChatLLaMA/issues/1