Model description

Before PR was automatically closed as a result of sync and pull, so it will be reopened.

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and works with both Test Generation and Masked Language Model.

To add this model to the transformer, I did the following: Porting GPTSAN to PyTorch. Model conversion. Creating model cards in HuggingFace Hub. Porting generation code. The model card has already been uploaded. (https://huggingface.co/Tanrei/GPTSAN-japanese/)

Tokenizer uses GPT-NeoX-Japanese, and only new vocabulary files are uploaded to the model card. Minor differences are absorbed within the generation algorithm in the model's source code.

GPTSAN repository is: https://github.com/tanreinama/GPTSAN

Discussion of HuggingFace integration is: https://github.com/tanreinama/GPTSAN/issues/2

Thanks to: @ArthurZucker and @younesbelkada

Jan 25 '23 01:01 tanreinama

The documentation is not available anymore as the PR was closed or merged.

Jan 25 '23 12:01 HuggingFaceDocBuilderDev

oh... I still get an error that I don't understand. Do you know what is wrong? I pulled and merged from the latest main.

Jan 25 '23 13:01 tanreinama

I will sync and pull main again.

Jan 26 '23 00:01 tanreinama

Do you want a review?

Jan 26 '23 14:01 ArthurZucker

@ArthurZucker yes. this is ok.

Jan 27 '23 00:01 tanreinama

@ArthurZucker can you review it or will you be late?

Feb 01 '23 00:02 tanreinama

Reviewing now 😉

Feb 01 '23 09:02 ArthurZucker

Still on the way: I have a few questions.

Feb 05 '23 11:02 tanreinama

Feel free to ask!

Feb 06 '23 06:02 ArthurZucker

thanks.

I was separated GPTSANJapaneseModel and GPTSANJapaneseForConditionalGeneration. Regarding the return value of GPTSANJapaneseForConditionalGeneration, using Seq2SeqMoEOutput like switch_transformers does not work. Well, this is not the encode_decode model.

return Seq2SeqMoEOutput(
            loss=loss,
            logits=lm_logits,
            encoder_z_loss=z_loss,
            encoder_aux_loss=aux_loss,
            past_key_values=outputs.past_key_values,
            encoder_last_hidden_state=outputs.last_hidden_state,
            encoder_hidden_states=outputs.hidden_states,
            encoder_attentions=outputs.attentions,
            encoder_router_logits=outputs.router_probs,
        )

↑ is said to be "there is no attentions in the output" in the unit test.

Using CausalLMOutputWithPast works.

return CausalLMOutputWithPast(
            loss=loss,
            logits=lm_logits,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

But CausalLMOutputWithPast doesn't have z_loss or other switch transformer outputs. I can't seem to find a good fit one in modeling_outputs.py. Is it ok without switch transformer outputs?

Feb 06 '23 09:02 tanreinama

ready to review.

Feb 08 '23 06:02 tanreinama

Due to the time difference, the continuation will be tomorrow

Feb 08 '23 11:02 tanreinama

Absolutely no problem! 😉

Feb 08 '23 12:02 ArthurZucker

can review it.

Feb 09 '23 08:02 tanreinama

@tanreinama the code looks much more cleaner now 🔥 Let's see the next review of @ArthurZucker but we wanted to thank you on your great efforts! I really like this model and would like to communicate about it on Twitter, can you share with us your social media handle? Thanks!

Feb 09 '23 10:02 younesbelkada

Oh... I don't do SNS. I don't have a Twitter or Instagram account (yeah, I'm a weirdo) I have only facebook. https://www.facebook.com/toshiyuki.sakamoto.75/

Feb 09 '23 10:02 tanreinama

I found a few typo in comment. so I fixed it.

Feb 10 '23 05:02 tanreinama

Reviewing again now

Feb 10 '23 09:02 ArthurZucker

Ok, it's reviewable.

Feb 16 '23 13:02 tanreinama

@ArthurZucker @sgugger I fixed the point in the comment. It's ready if checks are passed.

Feb 17 '23 06:02 tanreinama

Congratulations! 🚀 This was a big model addition and the codebase is very clean now! Will try to share this new model on tweeter and see if we can reach our Japanese community!

Feb 17 '23 09:02 ArthurZucker

good timing

Feb 18 '23 06:02 tanreinama

@ArthurZucker @sgugger ok. I fixed it.

Feb 20 '23 10:02 tanreinama

Congrats again on this work! and thanks for being a valuable contributor! 😉 🚀

Feb 20 '23 10:02 ArthurZucker

Wow! I'm very happy! And thanks to the HuggingFace team. I couldn't have done it without your amazing and persistent support. It was my first experience committing to such a large repository, so I learned a lot. And I'm so excited. It's already night in Japan, but I might not be able to sleep😘

Feb 20 '23 10:02 tanreinama

transformers
transformers copied to clipboard

add GPTSAN model (reopen)

Model description

transformers transformers copied to clipboard

add GPTSAN model (reopen)

Model description

transformers
transformers copied to clipboard