Patrick von Platen
Patrick von Platen
> I can follow up on the rest of the feedback this weekend / early next week: most of it looks manageable. > > One comment on `DimensionInfo`: I use...
Thanks for making the change! Test failures seem unrelated :-) Merging!
Here the PR to correct the naming: https://github.com/huggingface/transformers/pull/18896/files
@sanchit-gandhi could you maybe take a look here?
Hey @ydshieh, I'm a bit under water at the moment - I'll put the issue on my TODO-list, but I can't promise to find time to look into it very...
Hmm that's very interesting. A couple of pointers that might help: 1. `bart-large` always forces the second token to be the BOS token during generation (see https://huggingface.co/facebook/bart-large/blob/main/config.json#L27) where as led-large...
Also one last comment, note that just because `" "` always predicts the same token regardless of the encoder outputs doesn't mean training is necessarily broken. During training all `decoder_input_ids`...
Super interesting discussion here! Thanks for writing this all down @gante and @KMFODA :-) The PR looks nice to me in general - thanks a lot for opening it @KMFODA!...
Re-opening as it doesn't seem like it's been solved . Maybe @sgugger could help here?
In general T5 just doesn't work well with `fp16` since it was trained on bfloat16 and it seems like the model requires quite large values which are not supported by...