tiktoken icon indicating copy to clipboard operation
tiktoken copied to clipboard

Add get_encoding_name_for_model to tiktoken

Open noseworthy opened this issue 10 months ago • 2 comments

The tiktoken-js library includes a very helpful function, getEncodingNameForModel(). This function is buried in the implementation of encoding_for_model() in the rust based tiktoken package.

This function is very useful when implementing an encoding cache based on the model used. In this case, having a mapping from model -> encoding and then caching based on the encoding name conserves resources since so many models re-use the same encoding.

I've exposed a new get_encoding_name_for_model() function that behaves similarly to the one in the tiktoken-js package, and used it inside of encoding_for_model().

Finally, I've also added a test to ensure that this function can be called properly from typescript code, and that it properly throws exceptions in the case of invalid model names.

Fixes: dqbd/tiktoken#123

noseworthy avatar Feb 16 '25 16:02 noseworthy

Hey, @dqbd and @jens-f 👋

I think this should fix #123. It's a feature we've been looking for as well, so I figured I'd take a crack at it.

I'm not a rust developer, so please excuse any obvious blunders on my part. I'd love to know what you think. It'd be awesome to have this functionality exposed!

Thanks in advance for your consideration of the PR 🙏

noseworthy avatar Feb 17 '25 16:02 noseworthy

Hey @dqbd, any thoughts on this PR? I think I've fixed all the conflicts. It'd be great if we could expose this api in the wasm version of the library! Thanks!

noseworthy avatar May 06 '25 20:05 noseworthy

I'd like to see this get merged as well!

SirBernardPhilip avatar Jul 29 '25 23:07 SirBernardPhilip

Thank you for the PR! Will merge and release!

dqbd avatar Aug 09 '25 00:08 dqbd