semantic-router icon indicating copy to clipboard operation
semantic-router copied to clipboard

:balance_scale: Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` :llama:

Open adriens opened this issue 6 months ago • 1 comments

:grey_question: Context

As mentionned earlier :

  • https://github.com/aurelio-labs/semantic-router/pull/346

A gave a try to semantic-router and got really impressive results, see 🔀 Semantic Router w. ollama/gemma2 : real life 10ms hotline challenge 🤯 .

... but recently gemma2:2b has been released, then I switched to this model, with the hope that :

  • It should be faster
  • Be as good

... but surprinsngly it did as good, but slower.

:point_right: The goal of this issue is to understand why... and what could be done to make semantic router run even faster than 10ms.

:balance_scale: Data

Considering the following runs :

Below some performances, both with same output quality :

Cell N° gemma2:2b gemma2:8b
13 44.2 ms 33 ms
14 16 ms 12.7 ms
15 15.9 ms 12 ms
16 17 ms 12 ms
17 15.9 ms 11.8 ms
18 21.4 ms 11.4 ms
19 16.3 ms 12.6 ms
20 15.6 ms 11.6 ms

:information_source: On each test, the 8b is faster than the 2b... and it's surprising:

:bar_chart: Benchmark conclusion

The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.

:point_right: Do you get the same performances ?

adriens avatar Aug 03 '24 03:08 adriens