Gompyn

Results 8 comments of Gompyn

After the API is implemented, there can be some more issues to be resolved.

> Really want AppImage for ARM Linux tho But upstream appimage doesn't support ARM Linux... Maybe ask them to support that?

This is probably due to the following line, which is still not fixed in the HEAD. https://github.com/huggingface/transformers/blob/f58248b8240571bbbb0918ddd36cc3fdf061df11/src/transformers/tokenization_utils.py#L532-L537

This bug strips away `\n` around my special token, making my model believe that there is no newline in my text.

@ArthurZucker I think `decode(encode(text)) == text` should be true by default, because some use cases (e.g. code generation) require the correct formatting of text. "Automatic formatting" should not be done...

> > I think decode(encode(text)) == text should be true by default > > This is untrue for pretty much all tokenizers, since tokenization is a destructive operation. At the...