transformers icon indicating copy to clipboard operation
transformers copied to clipboard

add open-llama model with ckpt

Open s-JoL opened this issue 1 year ago • 4 comments

This PR adds a new model called Open-Llama, which is based on Llama's implementation in Transformers. In Open-Llama, emory-efficient attention has been added, resulting in a 30% improvement in training efficiency. Additionally, hidden dropout and attention dropout have been added for better generalization during training.

We have also added two optional features: stable embedding from Bloom and shared input-output vectors from PALM, which have been tested and found to improve training stability and performance.

The following code snippet shows the implementation of memory-efficient attention:

try:
    from xformers import ops as xops
except ImportError:
    xops = None
    print("xformers is not installed correctly.")

if self.config.use_memorry_efficient_attention and xops is not None and self.training:
    attn_weights = None
    query_states = query_states.transpose(1, 2)
    key_states = key_states.transpose(1, 2)
    value_states = value_states.transpose(1, 2)
    attn_output = xops.memory_efficient_attention(
        query_states, key_states, value_states, attn_bias=xops.LowerTriangularMask(), p=self.dropout_prob
    )

At the same time, for maximum compatibility, we have made xformers an optional dependency so that the original implementation can still be used for training and inference if it is not installed.

We implemented pre-training of the Llama model based on transformers + accelerate, incorporating the modifications described above. Open-Llama

The pre-trained model has already been open-sourced on s-JoL/Open-Llama-V1.

ref: https://github.com/huggingface/transformers/pull/22386

cc: @sgugger

s-JoL avatar Apr 16 '23 15:04 s-JoL

The documentation is not available anymore as the PR was closed or merged.

cc @ArthurZucker and @younesbelkada

sgugger avatar Apr 21 '23 16:04 sgugger

Please help me review this pull request. @ArthurZucker @younesbelkada

s-JoL avatar Apr 25 '23 12:04 s-JoL

Hey! Thanks will review now

ArthurZucker avatar Apr 25 '23 13:04 ArthurZucker

Thanks a lot for your contribution!

sgugger avatar Apr 28 '23 15:04 sgugger

Thanks a lot for your contribution!

Hello, I have a question, why the open-Llama model cannot be searched in the documentation of transformers? Is there something I forgot to add?

image

s-JoL avatar May 11 '23 07:05 s-JoL

Hi @s-JoL, thanks for notifying.

There was an issue in the doc rendering (resolved with 1, 2) leading to some pages not being retrievable in search. Should be working now!

amyeroberts avatar May 11 '23 10:05 amyeroberts

@s-JoL I noticed that the links pertaining to Open-LLaMA are currently leading to 404 errors. Could you please provide some information on what might have happened?

PenutChen avatar May 22 '23 05:05 PenutChen

@s-JoL Hi, I can't find a Open-LLaMA checkpoint and I noticed you delete your original repo. What happend? How Can I have a try of Open-LLaMA?

heya5 avatar May 24 '23 03:05 heya5

@heya5 Possibly due to some controversies surrounding this project, the original author has closed the original project. https://github.com/chenfeng357/open-Chinese-ChatLLaMA/issues/1

PenutChen avatar Jun 13 '23 01:06 PenutChen