readme-ai
readme-ai copied to clipboard
TikToken Special Character Conflict
When running in basic JS application, I'm getting this error:
ERROR [1:logger] [2024-01-29 15:51:28,449] Error in token encoding: Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.
Related to: https://github.com/langchain-ai/langchain/issues/923