gpt-tokenizer
gpt-tokenizer copied to clipboard
Improving documentation about the use of api vs importing function directly
From the documentation seems possible to import functions directly from gpt-tokenizer as:
import { encodeChat, isWithinTokenLimit } from "gpt-tokenizer/esm/main"
But doing it and running that code will create an exception on the GptEncoding
since this
inside of the execution will be undefined.
on clk100_base.js
:
const api = GptEncoding.getEncodingApi('cl100k_base', () => convertTokenBytePairEncodingFromTuples(encoder));
const { decode, decodeAsyncGenerator, decodeGenerator, encode, encodeGenerator, isWithinTokenLimit, encodeChat, encodeChatGenerator, } = api;
export { decode, decodeAsyncGenerator, decodeGenerator, encode, encodeChat, encodeChatGenerator, encodeGenerator, isWithinTokenLimit, };
export default api;
all destructure functions will not hold the correct reference to it's scope, not really know why, but executing them without using api
(the reference) will throw an exception and it is hard to debug.
The following code works
import api from "gpt-tokenizer/esm/main"
import { ChatMessage } from "gpt-tokenizer/esm/GptEncoding"
api.modelName = 'YourModelName' // gpt-4
const chatTokens = api.encodeChat(historyChat, "gpt-4")
const currentTokenCount = chatTokens.length
const withinTokenLimit = api.isWithinTokenLimit(historyChat, tokenLimit)
Should we rework the documentation so it is clear that you should use the GptEncoding
instance instead of the isolated functions? do you want me to open a PR?
Btw I'm using vite, and I'm not sure this could also be a bundling issue.