jan icon indicating copy to clipboard operation
jan copied to clipboard

bug: Nitro extension and Nitro does not handle chat template properly

Open hiro-v opened this issue 6 months ago • 1 comments

Describe the bug

  • There is a bug reported for using Mistral Instruct 7B Q4 model that has text strikethrough: https://discord.com/channels/1107178041848909847/1192366847446753330/1192371090123665419
  • After careful investigation, me and @louis-jan found out that the current way of handling chat template in https://github.com/janhq/jan/blob/main/extensions/inference-nitro-extension/src/module.ts#L214 does not work well in many cases.
  • We support user to import local model as well, we can't let them DPO one by one.

Steps to reproduce Steps to reproduce the behavior:

  1. Go to Hub -> Download Mistral Instruct 7B Q4
  2. Create new thread -> add system prompt as instruction. e.g: you are a very helpful assistant`
  3. Chat 1st time, no error. But for the second time, there is an error
  4. You can inspect the request, you can see that there is an <s> in the response which causes the strikethrough. This is because of the chat_template and the stop_word has not been defined/ used properly

See nitro implementation: https://github.com/janhq/nitro/blob/main/controllers/llamaCPP.cc - I think this should be fixed.

For the reference:

  • See https://github.com/ggerganov/llama.cpp/blob/master/examples/server/server.cpp that handles chat template
  • See https://github.com/abetlen/llama-cpp-python/blob/75d0527fd782a792af8612e55b0a3f2dad469ae9/llama_cpp/llama_chat_format.py which is very good in handling these cases (and tested)

Expected behavior

  • No error on the strikethrough problem
  • Proper code change in Nitro code and Jan - Nitro extension to handle this case.
  • Possibly docs to describe the change and which support/ which not

Screenshots If applicable, add screenshots to help explain your issue.

Environment details

  • Any, as this is Typescript logic

Additional context Nope

hiro-v avatar Jan 05 '24 16:01 hiro-v