promptable
promptable copied to clipboard
Wrong tokenizer used for OpenAI embeddings
I was looking through the OpenAI code and noticed that the wrong tokenizer is used for newer models like text-embedding-ada-002
that use cl100k
, implemented by tiktoken.
There is a list of encodings here for their public models.
I'm currently looking at making a wasm build of tiktoken, though I think a pure js approach would also work fine.
This might work -> https://www.npmjs.com/package/@dqbd/tiktoken @darknoon
Let me know