Add get_encoding_name_for_model to tiktoken
The tiktoken-js library includes a very helpful function,
getEncodingNameForModel(). This function is buried in the
implementation of encoding_for_model() in the rust based
tiktoken package.
This function is very useful when implementing an encoding cache based on the model used. In this case, having a mapping from model -> encoding and then caching based on the encoding name conserves resources since so many models re-use the same encoding.
I've exposed a new get_encoding_name_for_model() function
that behaves similarly to the one in the tiktoken-js package, and used
it inside of encoding_for_model().
Finally, I've also added a test to ensure that this function can be called properly from typescript code, and that it properly throws exceptions in the case of invalid model names.
Fixes: dqbd/tiktoken#123
Hey, @dqbd and @jens-f 👋
I think this should fix #123. It's a feature we've been looking for as well, so I figured I'd take a crack at it.
I'm not a rust developer, so please excuse any obvious blunders on my part. I'd love to know what you think. It'd be awesome to have this functionality exposed!
Thanks in advance for your consideration of the PR 🙏
Hey @dqbd, any thoughts on this PR? I think I've fixed all the conflicts. It'd be great if we could expose this api in the wasm version of the library! Thanks!
I'd like to see this get merged as well!
Thank you for the PR! Will merge and release!