BetterChatGPT icon indicating copy to clipboard operation
BetterChatGPT copied to clipboard

feature: add tokenizer

Open llegomark opened this issue 1 year ago • 5 comments

Add @dqbd/tiktoken to count prompt tokens

Benchmark: https://github.com/transitive-bullshit/compare-tokenizers

llegomark avatar Mar 05 '23 05:03 llegomark

This is cool. If we can count the tokens being generated by the user, we can use it to allow a certain amount of tokens to be consumed by the users in a given time frame. And then, we can integrate authentication using Supabase or any other authentication providers.

saliksik avatar Mar 06 '23 05:03 saliksik

@dqbd/tiktoken is not a pure js library and therefore not suitable for frontend

ayaka14732 avatar Mar 06 '23 13:03 ayaka14732

@ayaka14732 I just notice this right now, OpenAI API responded with: { "error": { "message": "Rate limit reached for default-gpt-3.5-turbo in organization org-************************ on requests per min. Limit: 20 / min. Current: 40 / min. Contact [email protected] if you continue to have issues.", "type": "requests", "param": null, "code": null } }.

Reference: https://platform.openai.com/docs/guides/rate-limits/overview

saliksik avatar Mar 06 '23 14:03 saliksik

@dqbd/tiktoken is not a pure js library and therefore not suitable for frontend

How about the gpt3-tokenizer?

saliksik avatar Mar 06 '23 14:03 saliksik

Just to comment - the current tokenizer in the repo is a wrong one - ChatGPT (gpt-3.5-turbo) doesn't use the default GPT2/3 tokenizer (the one that gpt3-tokenizer package implements), it uses a new one called cl100k_base that's only available in tiktoken for now.

ghost avatar Mar 09 '23 13:03 ghost