jan
jan copied to clipboard
bug: doc retrieval and embedding broken
Describe the bug Since installing v0.4.9-343 I can no longer attach documents in the chat if it's not the first message in a thread. Also there can be no assistant instructions.
Steps to reproduce Steps to reproduce the behaviour:
- Open a new thread
- Say Hi
- Now attach a doc and ask for a summary
- "Apologies something's amiss!" /or/
- Open a new thread
- Use the assistant instructions field, e.g. "you're allowed to be slightly sarcastic"
- In the first message attach a doc and ask for a summary
- "Apologies something's amiss!"
Expected behaviour I expect the doc to be embedded and a response to be generated using it as context.
Screenshots
Environment details
- Operating System: Win 11 Pro N x64
- Jan Version: 0.4.9-343
- Processor: Ryzen 7 7700X
- RAM: 32GB
- GPU: AMD RX 6800XT 16GB
Logs
2024-03-26T12:24:05.206Z [NITRO]::Debug: 20240326 12:17:14.343000 UTC 16136 INFO Here is the result:0 - llamaCPP.cc:420
20240326 12:24:05.205000 UTC 4816 INFO Clean cache threshold reached! - llamaCPP.cc:192
20240326 12:24:05.205000 UTC 4816 INFO Cache cleaned - llamaCPP.cc:194
20240326 12:24:05.205000 UTC 4816 ERROR Unhandled exception in /inferences/llamacpp/chat_completion, what(): Type is not convertible to string - HttpAppFrameworkImpl.cc:124
2024-03-26T12:24:06.212Z [NITRO]::Debug: 20240326 12:24:06.210000 UTC 4816 INFO sent the non stream, waiting for respone - llamaCPP.cc:416
[1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 882][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 5]
2024-03-26T12:24:06.212Z [NITRO]::Debug: [1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1722][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)
2024-03-26T12:24:06.617Z [NITRO]::Debug: [1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 475][llama_client_slot::print_timings]
[1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 480][llama_client_slot::print_timings] print_timings: prompt eval time = 332.33 ms / 65 tokens ( 5.11 ms per token, 195.59 tokens per second)
[1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 485][llama_client_slot::print_timings] print_timings: eval time = 72.45 ms / 6 runs ( 12.07 ms per token, 82.82 tokens per second)
[1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 487][llama_client_slot::print_timings] print_timings: total time = 404.77 ms
[1711455846] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (72 tokens in cache)
Additional context Changing the context length back to default 4096 (from 32k or 16k) does not fix. Tested using Mistral 7B instruct v0.2 Q5_K_M
Adding context mid-chat using the API endpoint (from Anything LLM) works.
Thank you, the issue is reproducible on both Windows and MacOS. We will resolve it ASAP. @louis-jan
hi @Propheticus, can you try again using our latest nightly 🙏 Jan v0.4.9-345
Thanks @Van-QA . Appears to have been fixed 👍. Still some quirks but will report those separately.