ru-dalle icon indicating copy to clipboard operation
ru-dalle copied to clipboard

Tokenizer decoding bug

Open neverix opened this issue 2 years ago • 1 comments

It seems like the tokenizer ignores the first letter when it's uppercase, chaining encode_text + decode_text shows that. What could be the source of this bug? Is this the intended behavior?

neverix avatar Dec 12 '21 11:12 neverix

@neverix, thank you a lot for all your provided features :)

yttm tokenizer was trained using lowercase text only, so it doesn't expect uppercase text. For correct encoding/decoding should use text.lower()

shonenkov avatar Dec 12 '21 16:12 shonenkov