Arthur comments

Results 795 comments of


                                            Arthur

HF CLIP image features different from OpenAI CLIP image features

@rafaelpadilla if you have time to look into this would be awesome!

DDP + gloo + gpt2 crashes

We had to change the bias to `torch.bool` especially because the `torch.where` operations are no longer supported with `uint8` in the most recent versions of pytorch. > Use of uint8...

Running run_translation.py with mt5 model, but loss is always 0.0

Inviting you to read #10956 which has very detailed explanation and a potential solution for you 😉

Allowing adding new token as unk token for gpt2 tokenizer

Also let's make sure that the tests are all green !

Allowing adding new token as unk token for gpt2 tokenizer

Hey! As mentioned in my last comments, the CI tests need to be all green 😉 Mostly `make fixup` should help you

Allowing adding new token as unk token for gpt2 tokenizer

I remember testing for a few models and having some issues with this update in token addition, I'll have to check once the PR is ready

Conv1D doesn't output token-wise results consistently.

Hey! Wow that's interesting. Two parts of answer: 1. Very cool. We can use `torch.testing.assert_allclose` to checkout the max differences, and indeed I have the following outputs: ```python In [73]:...

LlamaTokenizer has no `pad` token, leading to failure during batch-tokenization

Hey, indeed the original sentencepiece model does not have a padding token. You can probably pad using the `eos_token` like it is done for `GPT2`, need to check what is...

LlamaTokenizer has no `pad` token, leading to failure during batch-tokenization

Exactly! This was fixed in #22402 so keeping it closed!

LlamaTokenizer has no `pad` token, leading to failure during batch-tokenization

Hey @christoukmaji , this kind of question should be asked on the [forum](https://discuss.huggingface.co/). The first method will set `pad_token_id` to `2` while the other will give a different index.