jan
jan copied to clipboard
bug: Nitro extension and Nitro does not handle chat template properly
Describe the bug
- There is a bug reported for using
Mistral Instruct 7B Q4
model that has text strikethrough: https://discord.com/channels/1107178041848909847/1192366847446753330/1192371090123665419 - After careful investigation, me and @louis-jan found out that the current way of handling chat template in https://github.com/janhq/jan/blob/main/extensions/inference-nitro-extension/src/module.ts#L214 does not work well in many cases.
- We support user to import local model as well, we can't let them DPO one by one.
Steps to reproduce Steps to reproduce the behavior:
- Go to Hub -> Download
Mistral Instruct 7B Q4
- Create new thread -> add system prompt as instruction. e.g: you are a very helpful assistant`
- Chat 1st time, no error. But for the second time, there is an error
- You can inspect the request, you can see that there is an
<s>
in the response which causes the strikethrough. This is because of thechat_template
and the stop_word has not been defined/ used properly
See nitro implementation: https://github.com/janhq/nitro/blob/main/controllers/llamaCPP.cc - I think this should be fixed.
For the reference:
- See https://github.com/ggerganov/llama.cpp/blob/master/examples/server/server.cpp that handles chat template
- See https://github.com/abetlen/llama-cpp-python/blob/75d0527fd782a792af8612e55b0a3f2dad469ae9/llama_cpp/llama_chat_format.py which is very good in handling these cases (and tested)
Expected behavior
- No error on the strikethrough problem
- Proper code change in Nitro code and Jan - Nitro extension to handle this case.
- Possibly docs to describe the change and which support/ which not
Screenshots If applicable, add screenshots to help explain your issue.
Environment details
- Any, as this is Typescript logic
Additional context Nope