Arthur

Results 795 comments of Arthur

Hey @Aisuko, could you provide a **minimal** reproducer ? That would help use! Also note that the `generation parameters` issues can probably be safely ignored. The missing keys is however...

@humanely do you have the exact same issue? If not then open a separate issue. 1. the checkpoints you have did not save `['lm_head.weight', 'model.decoder.embed_tokens.weight']`. Now if you use `tie_word_embeddings`...

@vanguardapps it is usually very safe to do use the trainer, and pretty much rare to have a bug. I don't know which version of transformers you are using, but...

Awesome, that is already a good isolation. cc @pacman100, @muellerzr and @SunMarc when @vanguardapps share the reproducer, please have a look! 🤗

Hello @LinWeizheDragon, could you update the readme to have links to the pretrained checkpoints, the original codebase etc? I would recommend you to first start with a [code on the...

This was already answered, basically eager attention still attend to padding tokens (because the output of the softmax is never non zero) but with exact implementations / kernels, you have...