ru-dalle
ru-dalle copied to clipboard
Tokenizer decoding bug
It seems like the tokenizer ignores the first letter when it's uppercase, chaining encode_text
+ decode_text
shows that. What could be the source of this bug? Is this the intended behavior?
@neverix, thank you a lot for all your provided features :)
yttm tokenizer was trained using lowercase text only, so it doesn't expect uppercase text. For correct encoding/decoding should use text.lower()