Martin Evans comments

Results 252 comments of


                                            Martin Evans

Batched inference with greedy sampling yields different completions

Thanks for confirming that. I'll do some more digging into this to see if I can turn up anything more.

Batched inference with greedy sampling yields different completions

I tried running the BoolQ dataset again, but this time asking each question in N parallel sequences. As far as I can tell this always produces the same answer across...

Some logging output not captured with `llama_log_set`

As far as I know this issue is still relevant!

Added Approach script

> The general rule is that all contributions need working tests, to ensure some level of quality for the code. We discussed this in Discord and concluded there's not a...

feat: separate the sequence and conversation.

I don't understand the purpose of this change? The idea of `Conversation` is to a wrapper around the concept of a sequence and all the things you ca do with...

feat: separate the sequence and conversation.

> You could certainly do everything in Conversation I'm not suggesting adding anything extra to `Conversation` - what I meant by that question is what can be built _with_ (i.e....

feat: separate the sequence and conversation.

Before answering your questions, here the way I'm thinking about things: ### Low Level Just the functions and data structures that llama.cpp offers. Sometimes with a small layer of safety,...

feat: separate the sequence and conversation.

> Text decoder (optional) p.s. not a streaming one This has come up a few times in recent discussions but I don't think it is **possible** to have a non-streaming...

feat: separate the sequence and conversation.

> There's two batches in this workflow If I'm understanding this example correctly I don't think two batches are necessary. When `DecodeAsync` is called the `LLamaBatch` is copied into buffers...

Is it possible, that a model ist also able to use the online search during a chat?

That's probably something that would be built into an app _on top_ on LLamaSharp, rather than directly into LLamaSharp itself.