transformers
transformers copied to clipboard
add GPTSAN model (reopen)
Model description
Before PR was automatically closed as a result of sync and pull, so it will be reopened.
GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and works with both Test Generation and Masked Language Model.
To add this model to the transformer, I did the following: Porting GPTSAN to PyTorch. Model conversion. Creating model cards in HuggingFace Hub. Porting generation code. The model card has already been uploaded. (https://huggingface.co/Tanrei/GPTSAN-japanese/)
Tokenizer uses GPT-NeoX-Japanese, and only new vocabulary files are uploaded to the model card. Minor differences are absorbed within the generation algorithm in the model's source code.
GPTSAN repository is: https://github.com/tanreinama/GPTSAN
Discussion of HuggingFace integration is: https://github.com/tanreinama/GPTSAN/issues/2
Thanks to: @ArthurZucker and @younesbelkada
The documentation is not available anymore as the PR was closed or merged.
oh... I still get an error that I don't understand. Do you know what is wrong? I pulled and merged from the latest main.
I will sync and pull main again.
Do you want a review?
@ArthurZucker yes. this is ok.
@ArthurZucker can you review it or will you be late?
Reviewing now π
Still on the way: I have a few questions.
Feel free to ask!
thanks.
I was separated GPTSANJapaneseModel and GPTSANJapaneseForConditionalGeneration. Regarding the return value of GPTSANJapaneseForConditionalGeneration, using Seq2SeqMoEOutput like switch_transformers does not work. Well, this is not the encode_decode model.
return Seq2SeqMoEOutput(
loss=loss,
logits=lm_logits,
encoder_z_loss=z_loss,
encoder_aux_loss=aux_loss,
past_key_values=outputs.past_key_values,
encoder_last_hidden_state=outputs.last_hidden_state,
encoder_hidden_states=outputs.hidden_states,
encoder_attentions=outputs.attentions,
encoder_router_logits=outputs.router_probs,
)
β is said to be "there is no attentions in the output" in the unit test.
Using CausalLMOutputWithPast works.
return CausalLMOutputWithPast(
loss=loss,
logits=lm_logits,
past_key_values=outputs.past_key_values,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
But CausalLMOutputWithPast doesn't have z_loss or other switch transformer outputs. I can't seem to find a good fit one in modeling_outputs.py. Is it ok without switch transformer outputs?
ready to review.
Due to the time difference, the continuation will be tomorrow
Absolutely no problem! π
can review it.
@tanreinama the code looks much more cleaner now π₯ Let's see the next review of @ArthurZucker but we wanted to thank you on your great efforts! I really like this model and would like to communicate about it on Twitter, can you share with us your social media handle? Thanks!
Oh... I don't do SNS. I don't have a Twitter or Instagram account (yeah, I'm a weirdo) I have only facebook. https://www.facebook.com/toshiyuki.sakamoto.75/
I found a few typo in comment. so I fixed it.
Reviewing again now
Ok, it's reviewable.
@ArthurZucker @sgugger I fixed the point in the comment. It's ready if checks are passed.
Congratulations! π This was a big model addition and the codebase is very clean now! Will try to share this new model on tweeter and see if we can reach our Japanese community!
good timing
@ArthurZucker @sgugger ok. I fixed it.
Congrats again on this work! and thanks for being a valuable contributor! π π
Wow! I'm very happy! And thanks to the HuggingFace team. I couldn't have done it without your amazing and persistent support. It was my first experience committing to such a large repository, so I learned a lot. And I'm so excited. It's already night in Japan, but I might not be able to sleepπ