Martin Evans
Martin Evans
Depending on the exact version of llama.cpp you used to covnert it, you may be using a slightly incompatible file format. LLamaSharp is always a bit behind mainline llama.cpp. Just...
Did trying another model work?
> Sometimes I'm not really sure which binaries to choose for which release version of llamasharp in this list. To answer this specific question, you want to look at [this...
I'm not sure if our antiprompt detection will properly handle special tokens like that. I know there's special case handling for EOS in some places. That could be a good...
> I think x == EOS should be replaced everywhere with a llama_token_is_eog(x) This has actually been done in PR #712 (along with updating the binaries).
llama3 has been supported for a while, so I'll close this issue now.
The binaries in the latest release (0.11.1) are a little too old. The ones in the master branch were compiled after that PR was merged, so in theory they should...
> I tried copying the binaries from the master That won't work I'm afraid. The llama.cpp API is unstable so every time the binaries are updated there are various internal...
I don't have any ideas at the moment. I know Mamba is a bit of an unusual archtecture just because I've seen various comments inside llama.cpp about how certain APIs...
You'll get a progressive slowdown if you are using a stateless executor, and submitting larger and larger chat history each time. The stateful executors internally store the chat history and...