Martin Evans

Results 252 comments of Martin Evans

The BatchedExecutor doesn't have anything like that at the moment, it's up to you to sample a token, reprompt the conversation with that token, and detokenize it into text using...

Unfortunately I think that's probably because the inference process itself is not completely deterministic.

Yeah you're right, I was wrong there. I'll have a look into this.

Ah interesting, I'll wait and see if someone upstream knows what the issue is. Thanks for looking into that.

Looks like it's expected, according to ggerganov.

I'll close this issue now, since I think the questions have been answered and there hasn't been any activity for a while.

OpenCL backend has been deprecated by llama.cpp, and will be dropped by LLamaSharp with the next release. So I'm going to close this issue.

I've been planning to look into this for a while, since it's required for the BatchedExecutor to support llava. My plan has been to create a new batch class (LlamaBatchEmbeddings)....

#770 added a new `LLamaBatchEmbeddings` which can be used to drive inference with embeddings instead of tokens. It can be used with any context (it's not tied to the `BatchedExecutor`)...

It appears that the issue here is publish tries to place all of the various binaries (e.g. `avx\libllama.so`, `avx2\libllama.so`, `avx512\libllama.so` etc) into the root folder. Since they all have the...