starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Tokenizer not invertible

Open t-montes opened this issue 1 year ago • 0 comments

When encoding " ..." through the tokenizer, it's encoded as " ..." (one missing space), but tokenizers should be always 100% invertible, aren't they?

t-montes avatar Oct 14 '24 18:10 t-montes