transformers icon indicating copy to clipboard operation
transformers copied to clipboard

add GPTSAN model (reopen)

Open tanreinama opened this issue 2 years ago β€’ 5 comments

Model description

Before PR was automatically closed as a result of sync and pull, so it will be reopened.

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and works with both Test Generation and Masked Language Model.

To add this model to the transformer, I did the following: Porting GPTSAN to PyTorch. Model conversion. Creating model cards in HuggingFace Hub. Porting generation code. The model card has already been uploaded. (https://huggingface.co/Tanrei/GPTSAN-japanese/)

Tokenizer uses GPT-NeoX-Japanese, and only new vocabulary files are uploaded to the model card. Minor differences are absorbed within the generation algorithm in the model's source code.

GPTSAN repository is: https://github.com/tanreinama/GPTSAN

Discussion of HuggingFace integration is: https://github.com/tanreinama/GPTSAN/issues/2

Thanks to: @ArthurZucker and @younesbelkada

tanreinama avatar Jan 25 '23 01:01 tanreinama

The documentation is not available anymore as the PR was closed or merged.

oh... I still get an error that I don't understand. Do you know what is wrong? I pulled and merged from the latest main.

tanreinama avatar Jan 25 '23 13:01 tanreinama

I will sync and pull main again.

tanreinama avatar Jan 26 '23 00:01 tanreinama

Do you want a review?

ArthurZucker avatar Jan 26 '23 14:01 ArthurZucker

@ArthurZucker yes. this is ok.

tanreinama avatar Jan 27 '23 00:01 tanreinama

@ArthurZucker can you review it or will you be late?

tanreinama avatar Feb 01 '23 00:02 tanreinama

Reviewing now πŸ˜‰

ArthurZucker avatar Feb 01 '23 09:02 ArthurZucker

Still on the way: I have a few questions.

tanreinama avatar Feb 05 '23 11:02 tanreinama

Feel free to ask!

ArthurZucker avatar Feb 06 '23 06:02 ArthurZucker

thanks.

I was separated GPTSANJapaneseModel and GPTSANJapaneseForConditionalGeneration. Regarding the return value of GPTSANJapaneseForConditionalGeneration, using Seq2SeqMoEOutput like switch_transformers does not work. Well, this is not the encode_decode model.

return Seq2SeqMoEOutput(
            loss=loss,
            logits=lm_logits,
            encoder_z_loss=z_loss,
            encoder_aux_loss=aux_loss,
            past_key_values=outputs.past_key_values,
            encoder_last_hidden_state=outputs.last_hidden_state,
            encoder_hidden_states=outputs.hidden_states,
            encoder_attentions=outputs.attentions,
            encoder_router_logits=outputs.router_probs,
        )

↑ is said to be "there is no attentions in the output" in the unit test.

Using CausalLMOutputWithPast works.

return CausalLMOutputWithPast(
            loss=loss,
            logits=lm_logits,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

But CausalLMOutputWithPast doesn't have z_loss or other switch transformer outputs. I can't seem to find a good fit one in modeling_outputs.py. Is it ok without switch transformer outputs?

tanreinama avatar Feb 06 '23 09:02 tanreinama

ready to review.

tanreinama avatar Feb 08 '23 06:02 tanreinama

Due to the time difference, the continuation will be tomorrow

tanreinama avatar Feb 08 '23 11:02 tanreinama

Absolutely no problem! πŸ˜‰

ArthurZucker avatar Feb 08 '23 12:02 ArthurZucker

can review it.

tanreinama avatar Feb 09 '23 08:02 tanreinama

@tanreinama the code looks much more cleaner now πŸ”₯ Let's see the next review of @ArthurZucker but we wanted to thank you on your great efforts! I really like this model and would like to communicate about it on Twitter, can you share with us your social media handle? Thanks!

younesbelkada avatar Feb 09 '23 10:02 younesbelkada

Oh... I don't do SNS. I don't have a Twitter or Instagram account (yeah, I'm a weirdo) I have only facebook. https://www.facebook.com/toshiyuki.sakamoto.75/

tanreinama avatar Feb 09 '23 10:02 tanreinama

I found a few typo in comment. so I fixed it.

tanreinama avatar Feb 10 '23 05:02 tanreinama

Reviewing again now

ArthurZucker avatar Feb 10 '23 09:02 ArthurZucker

Ok, it's reviewable.

tanreinama avatar Feb 16 '23 13:02 tanreinama

@ArthurZucker @sgugger I fixed the point in the comment. It's ready if checks are passed.

tanreinama avatar Feb 17 '23 06:02 tanreinama

Congratulations! πŸš€ This was a big model addition and the codebase is very clean now! Will try to share this new model on tweeter and see if we can reach our Japanese community!

ArthurZucker avatar Feb 17 '23 09:02 ArthurZucker

good timing

tanreinama avatar Feb 18 '23 06:02 tanreinama

@ArthurZucker @sgugger ok. I fixed it.

tanreinama avatar Feb 20 '23 10:02 tanreinama

Congrats again on this work! and thanks for being a valuable contributor! πŸ˜‰ πŸš€

ArthurZucker avatar Feb 20 '23 10:02 ArthurZucker

Wow! I'm very happy! And thanks to the HuggingFace team. I couldn't have done it without your amazing and persistent support. It was my first experience committing to such a large repository, so I learned a lot. And I'm so excited. It's already night in Japan, but I might not be able to sleep😘

tanreinama avatar Feb 20 '23 10:02 tanreinama