Martin Evans

Results 252 comments of Martin Evans

Depending on the exact version of llama.cpp you used to covnert it, you may be using a slightly incompatible file format. LLamaSharp is always a bit behind mainline llama.cpp. Just...

Did trying another model work?

> Sometimes I'm not really sure which binaries to choose for which release version of llamasharp in this list. To answer this specific question, you want to look at [this...

I'm not sure if our antiprompt detection will properly handle special tokens like that. I know there's special case handling for EOS in some places. That could be a good...

> I think x == EOS should be replaced everywhere with a llama_token_is_eog(x) This has actually been done in PR #712 (along with updating the binaries).

llama3 has been supported for a while, so I'll close this issue now.

The binaries in the latest release (0.11.1) are a little too old. The ones in the master branch were compiled after that PR was merged, so in theory they should...

> I tried copying the binaries from the master That won't work I'm afraid. The llama.cpp API is unstable so every time the binaries are updated there are various internal...

I don't have any ideas at the moment. I know Mamba is a bit of an unusual archtecture just because I've seen various comments inside llama.cpp about how certain APIs...

You'll get a progressive slowdown if you are using a stateless executor, and submitting larger and larger chat history each time. The stateful executors internally store the chat history and...