crabml
crabml copied to clipboard
Any speed testment?
Any speed testment?
i have a small benchmark script to compare the performanc e between llama.cpp (built with LLAMA_NO_METAL) and crabml on running CPU inference: https://gist.github.com/flaneur2020/27a384e8a6eae8963491c0bbf6bb9033
it seems that crabml could out perform llama.cpp on generating 100 tokens with gemma 2b and openllama 3b on my m1 laptop in a token-after-token basis:
llama.cpp crabml
3.106 | 3.132
3.155 | 3.069
3.175 | 3.102
3.191 | 3.070
3.248 | 3.041
3.215 | 3.121
however the the prompt processing speed is still under optimization in crabml, we'd like to consider the approach in https://justine.lol/matmul/ to accelerate the batched prompt processing.
also, the GPU acceleration is still WIP, i'd like to make a better performance report after the GPU part get more useable.