Martin Evans comments

Results 252 comments of


                                            Martin Evans

SemanticKernel ChatCompletion is Stateless

> kv cache management The BatchedExecutor exposes all of the kv cache stuff, per "Conversation". So e.g. you can shift off token, or rewind state etc. That should be a...

Information on new important updates in llama.cpp

See #662, that update should include these things.

Examples don't run with CUDA12

I don't know much about CUDA, but yes I think that would fix it (Onkitova tested it out in https://github.com/SciSharp/LLamaSharp/pull/371) Last time we discussed this ([ref](https://github.com/SciSharp/LLamaSharp/issues/350#issuecomment-1879916928)) I think we decided...

Parallel Inferencing?

LLamaSharp intends to be threadsafe, but that's a bit tricky due to some thread safety issues in llama.cpp itself. At the moment it's set up so there's a global lock...

Parallel Inferencing?

LLamaSharp has the BatchedExecutor which is an entirely new executor I've been working on. You can spawn multiple "Conversations" which can all be prompted and then inference runs for all...

Parallel Inferencing?

The `BatchedExecutor` is actually already available in the previous release (although of course there will be improvements in the next release!).

Parallel Inferencing?

I'd suggest cloning the master branch and working with that, `BatchedExecutor` is very new and I think the things you're asking about have been changed (and hopefully improved!). For example...

Parallel Inferencing?

`BatchedExecutor` itself is not currently designed to be used in parallel (although it might be modified to allow that in the future). The parallelism is built into it - when...

Parallel Inferencing?

Try the `BatchedExecutor` demos in LLamaSharp to get a feel for the speed. The `Fork` example starts with one conversation and keeps forking it again and again so it ends...

Parallel Inferencing?

The basic flow for the batched executor is: 1. Create one or more conversations: ```csharp using var conversation = executor.Create(); conversation.Prompt("Hello AshD"); ``` 2. Call `Infer()` to run the model...