:balance_scale: Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` :llama:

Open adriens opened this issue 6 months ago • 1 comments

:grey_question: Context

As mentionned earlier :

https://github.com/aurelio-labs/semantic-router/pull/346

A gave a try to semantic-router and got really impressive results, see 🔀 Semantic Router w. ollama/gemma2 : real life 10ms hotline challenge 🤯 .

... but recently gemma2:2b has been released, then I switched to this model, with the hope that :

It should be faster
Be as good

... but surprinsngly it did as good, but slower.

:point_right: The goal of this issue is to understand why... and what could be done to make semantic router run even faster than 10ms.

:balance_scale: Data

Considering the following runs :

Below some performances, both with same output quality :

Cell N°	`gemma2:2b`	`gemma2:8b`
13	44.2 ms	33 ms
14	16 ms	12.7 ms
15	15.9 ms	12 ms
16	17 ms	12 ms
17	15.9 ms	11.8 ms
18	21.4 ms	11.4 ms
19	16.3 ms	12.6 ms
20	15.6 ms	11.6 ms

:information_source: On each test, the 8b is faster than the 2b... and it's surprising:

:bar_chart: Benchmark conclusion

The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.

:point_right: Do you get the same performances ?

Aug 03 '24 03:08 adriens

semantic-router semantic-router copied to clipboard

:balance_scale: Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` :llama:

:grey_question: Context

:balance_scale: Data

:bar_chart: Benchmark conclusion

semantic-router
semantic-router copied to clipboard