GPT-J Will Not Accept Certain Tokens in Prompt
GPT-J does not like tokenizing certain characters when they appear in a prompt - so far I have only been able to induce this behavior with a ! character, but I haven't performed an exhaustive search.
llm: ./target/release/llm gptj infer -m ~/.ggml-models/gpt4all-j-v1.3-groovy.bin -p "!"
✓ Loaded 285 tensors (3.8 GB) after 1980ms
[2023-05-11T14:36:15Z ERROR llm] Failed to tokenize initial prompt.
Our current tokenizer is built around scores. Perhaps we should use a simpler tokenizer for the models where it's known no score is present for the tokens?
Couldn't we use huggingfaces tokenizer? Then we would have parity with nearly every implementation out there 🤔
Yeah maybe, see #35
@RedBoxing - can you see if this is fixed on your RWKV branch?
no issues at all !