Replace gpt-3-encoder with gpt-tokenizer or similar
Currently gpt-3-encoder is used:
- https://github.com/latitudegames/GPT-3-Encoder
-
Javascript BPE Encoder Decoder for GPT-2 / GPT-3
-
But it might make more sense to use a library that also supports GPT-4 as well, for example:
- https://github.com/niieani/gpt-tokenizer
-
JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features.
-
gpt-tokenizeris a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments. -
As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM.
-
No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)
-
Historical note: This package started off as a fork of latitudegames/GPT-3-Encoder, but version 2.0 was rewritten from scratch.
-
- https://gpt-tokenizer.dev/
Currently gpt-3-encoder is referenced in a few places:
- https://github.com/jehna/humanify/blob/main/package.json#L21
- https://github.com/jehna/humanify/blob/main/src/openai/split-file.ts#L1
See Also
- https://github.com/jehna/humanify/issues/4