Wagtail

Results 56 comments of Wagtail

Have you tried normalizing your input text, e.g. with `input.capitalize()` ? The sentencepiece tokenizer junks rare words in many small parts, especially if they are uppercase and regular not uppercase.

I am currently [researching about language modeling](https://gitlab.com/Bachstelze/instructionbert).

@Leolty It could be possible that the model generates multiple words if it was pretrained with longer masked spans like in [UL2 mixture of denoisers](https://ai.googleblog.com/2022/10/ul2-20b-open-source-unified-language.html). Sometimes the t5 models already...

What is the status? The logs of the checks are expired.

> If you have less than the default number of GPUs (8) Who has a default number of 8 GPUs?

@conceptofmind Sorry, I got confused by this figure from [UL2](https://ai.googleblog.com/2022/10/ul2-20b-open-source-unified-language.html) and concluded that they switched completely to encode-decoder models: ![image](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiozftwuxITX87OmCkAwkBouHRkjmpZHlfHCZYxRdp6_E5rLigiia3l1JlxvSnhih67iQ_CI1lQmtfffvuXNLGhuO5rFsrifmT1rk5wfLTCKcYK-6ngoendoOUzqUP1SENoQs9WvB-nsu7QDgha57NZXVMU6OpxOrbu9Mh4qKzsE3t6a0BGhlyMYhSLkw/w400-h346/image1.png) Description: In both decoder-only and encoder-decoder setups, UL2 strikes a...

@conceptofmind Thank you for your interest and contribution! To my knowledge, there is no research that shows that a decoder-only modification has a better performance, than an encoder-decoder architecture. The...

What is your base model? Flan-t5? Is there a documentation? [GPT4ALL ](https://github.com/nomic-ai/gpt4all) released weights and data for code instructions.

> For a generation problem, it is usually better to use GPT2 as the decoder, over BERT. Why should this be the case, if you have enough data to train...

> > > For a generation problem, it is usually better to use GPT2 as the decoder, over BERT. > > > > > > Why should this be the...