jan bug: Nitro extension and Nitro does not handle chat template properly

bug: Nitro extension and Nitro does not handle chat template properly

Open hiro-v opened this issue 6 months ago • 1 comments

Describe the bug

There is a bug reported for using Mistral Instruct 7B Q4 model that has text strikethrough: https://discord.com/channels/1107178041848909847/1192366847446753330/1192371090123665419
After careful investigation, me and @louis-jan found out that the current way of handling chat template in https://github.com/janhq/jan/blob/main/extensions/inference-nitro-extension/src/module.ts#L214 does not work well in many cases.
We support user to import local model as well, we can't let them DPO one by one.

Steps to reproduce Steps to reproduce the behavior:

Go to Hub -> Download Mistral Instruct 7B Q4
Create new thread -> add system prompt as instruction. e.g: you are a very helpful assistant`
Chat 1st time, no error. But for the second time, there is an error
You can inspect the request, you can see that there is an <s> in the response which causes the strikethrough. This is because of the chat_template and the stop_word has not been defined/ used properly

See nitro implementation: https://github.com/janhq/nitro/blob/main/controllers/llamaCPP.cc - I think this should be fixed.

For the reference:

See https://github.com/ggerganov/llama.cpp/blob/master/examples/server/server.cpp that handles chat template
See https://github.com/abetlen/llama-cpp-python/blob/75d0527fd782a792af8612e55b0a3f2dad469ae9/llama_cpp/llama_chat_format.py which is very good in handling these cases (and tested)

Expected behavior

Screenshots If applicable, add screenshots to help explain your issue.

Environment details

Additional context Nope

Jan 05 '24 16:01 hiro-v