ru-dalle Tokenizer decoding bug

Tokenizer decoding bug

Open neverix opened this issue 2 years ago • 1 comments

It seems like the tokenizer ignores the first letter when it's uppercase, chaining encode_text + decode_text shows that. What could be the source of this bug? Is this the intended behavior?

Dec 12 '21 11:12 neverix

@neverix, thank you a lot for all your provided features :)

yttm tokenizer was trained using lowercase text only, so it doesn't expect uppercase text. For correct encoding/decoding should use text.lower()

Dec 12 '21 16:12 shonenkov

ru-dalle ru-dalle copied to clipboard

Tokenizer decoding bug

ru-dalle
ru-dalle copied to clipboard