Martin Evans comments

Results 252 comments of


                                            Martin Evans

Parallel Inferencing?

The BatchedExecutor doesn't have anything like that at the moment, it's up to you to sample a token, reprompt the conversation with that token, and detokenize it into text using...

Parallel Inferencing?

Unfortunately I think that's probably because the inference process itself is not completely deterministic.

Parallel Inferencing?

Yeah you're right, I was wrong there. I'll have a look into this.

Parallel Inferencing?

Ah interesting, I'll wait and see if someone upstream knows what the issue is. Thanks for looking into that.

Parallel Inferencing?

Looks like it's expected, according to ggerganov.

Parallel Inferencing?

I'll close this issue now, since I think the questions have been answered and there hasn't been any activity for a while.

LLamaSharp.Backend.OpenCL 0.11.2: GGML_ASSERT llama.cpp:14093: hparams.n_embd_head_v % ggml_blck_size(type_v) == 0

OpenCL backend has been deprecated by llama.cpp, and will be dropped by LLamaSharp with the next release. So I'm going to close this issue.

[Feature Request] Support using embeddings as input

I've been planning to look into this for a while, since it's required for the BatchedExecutor to support llava. My plan has been to create a new batch class (LlamaBatchEmbeddings)....

[Feature Request] Support using embeddings as input

#770 added a new `LLamaBatchEmbeddings` which can be used to drive inference with embeddings instead of tokens. It can be used with any context (it's not tied to the `BatchedExecutor`)...

dotnet publish issue - multiple publish output files with same relative path

It appears that the issue here is publish tries to place all of the various binaries (e.g. `avx\libllama.so`, `avx2\libllama.so`, `avx512\libllama.so` etc) into the root folder. Since they all have the...