gpt-tokenizer icon indicating copy to clipboard operation
gpt-tokenizer copied to clipboard

Improving documentation about the use of api vs importing function directly

Open nahumzs opened this issue 11 months ago • 2 comments

From the documentation seems possible to import functions directly from gpt-tokenizer as:

import { encodeChat, isWithinTokenLimit } from "gpt-tokenizer/esm/main"

But doing it and running that code will create an exception on the GptEncoding since this inside of the execution will be undefined.


on clk100_base.js:

const api = GptEncoding.getEncodingApi('cl100k_base', () => convertTokenBytePairEncodingFromTuples(encoder));
const { decode, decodeAsyncGenerator, decodeGenerator, encode, encodeGenerator, isWithinTokenLimit, encodeChat, encodeChatGenerator, } = api;
export { decode, decodeAsyncGenerator, decodeGenerator, encode, encodeChat, encodeChatGenerator, encodeGenerator, isWithinTokenLimit, };

export default api;

all destructure functions will not hold the correct reference to it's scope, not really know why, but executing them without using api (the reference) will throw an exception and it is hard to debug.

The following code works

import api from "gpt-tokenizer/esm/main"
import { ChatMessage } from "gpt-tokenizer/esm/GptEncoding"

 api.modelName = 'YourModelName' // gpt-4
 const chatTokens = api.encodeChat(historyChat, "gpt-4")
 const currentTokenCount = chatTokens.length
 const withinTokenLimit = api.isWithinTokenLimit(historyChat, tokenLimit)

Should we rework the documentation so it is clear that you should use the GptEncoding instance instead of the isolated functions? do you want me to open a PR?

Btw I'm using vite, and I'm not sure this could also be a bundling issue.

nahumzs avatar Jul 02 '23 05:07 nahumzs