readme-ai icon indicating copy to clipboard operation
readme-ai copied to clipboard

TikToken Special Character Conflict

Open jamesvillarrubia opened this issue 1 year ago • 0 comments

When running in basic JS application, I'm getting this error:

ERROR    [1:logger] [2024-01-29 15:51:28,449] Error in token encoding: Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.

Related to: https://github.com/langchain-ai/langchain/issues/923

jamesvillarrubia avatar Jan 29 '24 15:01 jamesvillarrubia