Patrick von Platen comments

Results 1228 comments of


                                            Patrick von Platen

PEGASUS-X

> I can follow up on the rest of the feedback this weekend / early next week: most of it looks manageable. > > One comment on `DimensionInfo`: I use...

PEGASUS-X

Thanks for making the change! Test failures seem unrelated :-) Merging!

PEGASUS-X

Here the PR to correct the naming: https://github.com/huggingface/transformers/pull/18896/files

[FLAX] Add dtype to embedding for gpt2 model

@sanchit-gandhi could you maybe take a look here?

[Summary] Regarding memory issue in tests

Hey @ydshieh, I'm a bit under water at the moment - I'll put the issue on my TODO-list, but I can't promise to find time to look into it very...

Longformer EncoderDecoder (LED)-Large model finetuning for summarization results in </s><s><s><s><s><s><s><s><s><s><s>... output

Hmm that's very interesting. A couple of pointers that might help: 1. `bart-large` always forces the second token to be the BOS token during generation (see https://huggingface.co/facebook/bart-large/blob/main/config.json#L27) where as led-large...

Longformer EncoderDecoder (LED)-Large model finetuning for summarization results in </s><s><s><s><s><s><s><s><s><s><s>... output

Also one last comment, note that just because `" "` always predicts the same token regardless of the encoder outputs doesn't mean training is necessarily broken. During training all `decoder_input_ids`...

Patrick von Platen

PEGASUS-X

PEGASUS-X

PEGASUS-X

[FLAX] Add dtype to embedding for gpt2 model

[Summary] Regarding memory issue in tests

Longformer EncoderDecoder (LED)-Large model finetuning for summarization results in </s><s><s><s><s><s><s><s><s><s><s>... output

Longformer EncoderDecoder (LED)-Large model finetuning for summarization results in </s><s><s><s><s><s><s><s><s><s><s>... output

Add hallucination filter in generate()

Socket Timeout when using DDP

Training with fp16 precision gives nan in Longt5