Shantanu

Results 840 comments of Shantanu

There isn't enough information here to reproduce the error. Could you provide more details?

What version of tiktoken did you use? How did you install it? etc

tiktoken 0.8 will have a better error message here

This is a great question. I have some nice internal documentation explaining what problem this is solving, I'll see if I can make a version of it that doesn't include...

It's a little bit of both. The first step we do in tokenising is regex splitting, where we split on things like whitespace. The ensuing chunks are then fed to...

Thanks, that's a reasonable use case. Every token is at least one byte, so for now I'd recommend running a quick length check against your input first. At typical current...

Thanks for the suggestion! I'm not currently planning on implementing this, but it is likely that at some point we will. If other people encountering this also have this feature...

Just published a new version of tiktoken that includes a mapping from model to tokeniser. Anything not using r50k is liable to be incorrect (sometimes subtly, sometimes majorly, sometimes majorly...

Thanks, those are really nice results! 1. Last time I checked, regex splitting was the majority of the time — I'd be interested in benchmarking the splitting part if easy....

On tiktoken 0.8 this raises a more normal Python exception (KeyError)