Marin Bonacci comments

Repositories
Issues
Comments

Results 3 comments of


                                            Marin Bonacci

Parallel Inferencing?

I've been playing around with BatchedExecutor example and noticed when I set sampling temperature to 0 (or use GreedySamplingPipeline), the conversations diverge. I've tries a few short initial prompts and...

Parallel Inferencing?

That doesn't sound right... As I understand, when sampling using greedy sampler (which always picks most probable token), the inference process should always return same result for same prompt. I...

Parallel Inferencing?

I did a test with batched example from llama.cpp (modified to use greedy sampling) and the results are the same. So I created an issue in llama.cpp (https://github.com/ggerganov/llama.cpp/issues/6583) for this.