semantic-router
semantic-router copied to clipboard
:balance_scale: Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` :llama:
:grey_question: Context
As mentionned earlier :
- https://github.com/aurelio-labs/semantic-router/pull/346
A gave a try to semantic-router and got really impressive results, see 🔀 Semantic Router w. ollama/gemma2 : real life 10ms hotline challenge 🤯 .
... but recently gemma2:2b
has been released, then I switched to this model, with the hope that :
- It should be faster
- Be as good
... but surprinsngly it did as good, but slower.
:point_right: The goal of this issue is to understand why... and what could be done to make semantic router run even faster than 10ms.
:balance_scale: Data
Considering the following runs :
Below some performances, both with same output quality :
Cell N° | gemma2:2b |
gemma2:8b |
---|---|---|
13 | 44.2 ms | 33 ms |
14 | 16 ms | 12.7 ms |
15 | 15.9 ms | 12 ms |
16 | 17 ms | 12 ms |
17 | 15.9 ms | 11.8 ms |
18 | 21.4 ms | 11.4 ms |
19 | 16.3 ms | 12.6 ms |
20 | 15.6 ms | 11.6 ms |
:information_source: On each test, the 8b is faster than the 2b... and it's surprising:
:bar_chart: Benchmark conclusion
The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.
:point_right: Do you get the same performances ?