mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Fix timings of completion, add timing of sampling back

Open lucasavila00 opened this issue 9 months ago • 2 comments

Now that we're sampling fully in CPU, we should not merge the sampling timings into completion timings.

This will likely show an improvement on mistralrs-bench's tg test.

Notice llama-bench selects a random token https://github.com/ggerganov/llama.cpp/blob/master/examples/llama-bench/llama-bench.cpp#L1161 so currently tg test is not a truly fair comparison.

Basically, we need to revert https://github.com/EricLBuehler/mistral.rs/pull/151

lucasavila00 avatar Apr 28 '24 16:04 lucasavila00

I'm not sure if it's better to re-design the code or mutate some variable here, which is the point where we know the correct timings

image

lucasavila00 avatar Apr 28 '24 16:04 lucasavila00

I think that we should start sampling timing there as it excludes the sync point.

EricLBuehler avatar Apr 28 '24 18:04 EricLBuehler

Done in a previous PR which didn't close this.

EricLBuehler avatar May 03 '24 14:05 EricLBuehler