Marin Bonacci

Results 3 comments of Marin Bonacci

I've been playing around with BatchedExecutor example and noticed when I set sampling temperature to 0 (or use GreedySamplingPipeline), the conversations diverge. I've tries a few short initial prompts and...

That doesn't sound right... As I understand, when sampling using greedy sampler (which always picks most probable token), the inference process should always return same result for same prompt. I...

I did a test with batched example from llama.cpp (modified to use greedy sampling) and the results are the same. So I created an issue in llama.cpp (https://github.com/ggerganov/llama.cpp/issues/6583) for this.