starcoder
starcoder copied to clipboard
Tokenizer not invertible
When encoding " ..." through the tokenizer, it's encoded as " ..." (one missing space), but tokenizers should be always 100% invertible, aren't they?