humanify icon indicating copy to clipboard operation
humanify copied to clipboard

Replace gpt-3-encoder with gpt-tokenizer or similar

Open 0xdevalias opened this issue 2 years ago • 0 comments

Currently gpt-3-encoder is used:

  • https://github.com/latitudegames/GPT-3-Encoder
    • Javascript BPE Encoder Decoder for GPT-2 / GPT-3

But it might make more sense to use a library that also supports GPT-4 as well, for example:

  • https://github.com/niieani/gpt-tokenizer
    • JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features.

    • gpt-tokenizer is a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments.

    • As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM.

    • No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)

    • Historical note: This package started off as a fork of latitudegames/GPT-3-Encoder, but version 2.0 was rewritten from scratch.

  • https://gpt-tokenizer.dev/

Currently gpt-3-encoder is referenced in a few places:

  • https://github.com/jehna/humanify/blob/main/package.json#L21
  • https://github.com/jehna/humanify/blob/main/src/openai/split-file.ts#L1

See Also

  • https://github.com/jehna/humanify/issues/4

0xdevalias avatar Nov 13 '23 04:11 0xdevalias