whisper.cpp
whisper.cpp copied to clipboard
Benchmark results
Encoder
Collection of bench results for various platforms and devices. If you want to submit info about your device, simply run the bench tool or the extra/bench-all.sh and report the results in the comments below.
Suggestions for better summary of the results are welcome
CPU | OS | Config | Model | Th | Load | Enc. | Commit |
---|---|---|---|---|---|---|---|
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | tiny | 8 | 71 | 102 | 206fc93 |
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | base | 8 | 96 | 220 | 206fc93 |
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | small | 8 | 233 | 685 | 206fc93 |
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | medium | 8 | 603 | 1928 | 206fc93 |
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | large | 8 | 1158 | 3350 | 206fc93 |
--- | |||||||
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | small | 1 | 251 | 2605 | 206fc93 |
MacBook M1 Pro | MacOS 13.0.1 | NEON BLAS | small | 4 | 255 | 884 | 206fc93 |
--- | |||||||
Mac Mini M1 | MacOS | NEON BLAS | tiny | 4 | 62 | 194 | fcf515d |
Mac Mini M1 | MacOS | NEON BLAS | base | 4 | 81 | 380 | fcf515d |
Mac Mini M1 | MacOS | NEON BLAS | small | 4 | 204 | 1249 | fcf515d |
Mac Mini M1 | MacOS | NEON BLAS | medium | 4 | 876 | 3980 | fcf515d |
Mac Mini M1 | MacOS | NEON BLAS | large | 4 | 1876 | 7979 | fcf515d |
--- | |||||||
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 | tiny | 8 | 107 | 422 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 | base | 8 | 137 | 880 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 | small | 8 | 280 | 2874 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 | medium | 8 | 692 | 9610 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 | large | 8 | 1317 | 16917 | fcf515d |
--- | |||||||
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 BLAS | tiny | 4 | 120 | 780 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 BLAS | base | 4 | 151 | 1173 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 BLAS | small | 4 | 289 | 3062 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 BLAS | medium | 4 | 711 | 9175 | fcf515d |
Ryzen 9 3900X | Ubuntu 20.04 | AVX2 BLAS | large | 4 | 1282 | 16050 | fcf515d |
--- | |||||||
Ryzen 9 5950X | Ubuntu 22.04 | AVX2 | tiny | 8 | 135 | 197 | fcf515d |
Ryzen 9 5950X | Ubuntu 22.04 | AVX2 | base | 8 | 176 | 421 | fcf515d |
Ryzen 9 5950X | Ubuntu 22.04 | AVX2 | small | 8 | 357 | 1393 | fcf515d |
Ryzen 9 5950X | Ubuntu 22.04 | AVX2 | medium | 8 | 855 | 4404 | fcf515d |
Ryzen 9 5950X | Ubuntu 22.04 | AVX2 | large | 8 | 1576 | 8118 | fcf515d |
--- | |||||||
Raspberry Pi 4 | NEON | tiny | 4 | 1436 | 13839 | fcf515d | |
Raspberry Pi 4 | NEON | base | 4 | 1894 | 30552 | fcf515d | |
--- | |||||||
iPhone 13 Mini | iOS 16.0 | NEON BLAS | base | 4 | 97 | 1091 | fcf515d |
--- | |||||||
MacBook M1 Pro | Vivaldi | WASM | tiny | 8 | 133 | 3785 | fcf515d |
MacBook M1 Pro | Vivaldi | WASM | base | 8 | 172 | 8253 | fcf515d |
--- | |||||||
MacBook M1 Pro | Chrome | WASM | tiny | 8 | 134 | 3776 | fcf515d |
MacBook M1 Pro | Chrome | WASM | base | 8 | 168 | 8200 | fcf515d |
--- | |||||||
MacBook M1 Pro | Firefox | WASM | tiny | 8 | 137 | 2626 | fcf515d |
MacBook M1 Pro | Firefox | WASM | base | 8 | 183 | 6226 | fcf515d |
memcpy
MacBook M1 Pro
./bench -w 1 -t 1
memcpy: 37.59 GB/s
Ryzen 9 5950X
./bench -w 1 -t 1
memcpy: 16.74 GB/s
ggml_mul_mat
MacBook M1 Pro
./bench -w 2 -t 1
ggml_mul_mat: 64 x 64: F16 330.6 GFLOPS (128 runs) / F32 466.0 GFLOPS (128 runs)
ggml_mul_mat: 128 x 128: F16 737.5 GFLOPS (128 runs) / F32 838.9 GFLOPS (128 runs)
ggml_mul_mat: 256 x 256: F16 938.6 GFLOPS (128 runs) / F32 1062.3 GFLOPS (128 runs)
ggml_mul_mat: 512 x 512: F16 1312.5 GFLOPS (128 runs) / F32 1835.5 GFLOPS (128 runs)
ggml_mul_mat: 1024 x 1024: F16 1765.1 GFLOPS (128 runs) / F32 2041.4 GFLOPS (128 runs)
ggml_mul_mat: 2048 x 2048: F16 1784.3 GFLOPS (104 runs) / F32 1859.2 GFLOPS (109 runs)
ggml_mul_mat: 4096 x 4096: F16 1855.1 GFLOPS ( 14 runs) / F32 1873.3 GFLOPS ( 14 runs)
Ryzen 9 5950X
WHISPER_OPENBLAS=1 make -j bench && ./bench -w 2 -t 1
ggml_mul_mat: 64 x 64: F16 56.3 GFLOPS (128 runs) / F32 70.2 GFLOPS (128 runs)
ggml_mul_mat: 128 x 128: F16 47.8 GFLOPS (128 runs) / F32 67.0 GFLOPS (128 runs)
ggml_mul_mat: 256 x 256: F16 185.1 GFLOPS (128 runs) / F32 332.7 GFLOPS (128 runs)
ggml_mul_mat: 512 x 512: F16 386.4 GFLOPS (128 runs) / F32 658.6 GFLOPS (128 runs)
ggml_mul_mat: 1024 x 1024: F16 636.2 GFLOPS (128 runs) / F32 1012.0 GFLOPS (128 runs)
ggml_mul_mat: 2048 x 2048: F16 950.9 GFLOPS ( 56 runs) / F32 1296.8 GFLOPS ( 76 runs)
ggml_mul_mat: 4096 x 4096: F16 1168.6 GFLOPS ( 9 runs) / F32 1403.1 GFLOPS ( 11 runs)
Results for Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-4790K | Debian | tiny.en | 4 | 165 | 808 | |
i7-4790K | Debian | tiny.en | 8 | 165 | 783 | |
i7-4790K | Debian | base.en | 4 | 212 | 1813 | |
i7-4790K | Debian | base.en | 8 | 214 | 1746 |
Results for Ryzen 5 4500U 6C/6T laptop CPU (I've just included one result for 8 threads as Encode time is much higher when threads > CPU cores).
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
Ryzen 5 4500U (6C/6T) | Opensuse Leap | tiny.en | 4 | 170.00 | 829.43 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | tiny.en | 6 | 143.03 | 671.74 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | base.en | 4 | 305.92 | 2,092.39 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | base.en | 6 | 188.05 | 1,495.61 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | small.en | 4 | 408.03 | 6,919.31 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | small.en | 6 | 359.23 | 6,370.83 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | medium.en | 4 | 2,238.11 | 25,863.28 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | medium.en | 6 | 1,113.04 | 19,672.63 | |
Ryzen 5 4500U (6C/6T) | Opensuse Leap | medium.en | 8 | 973.65 | 39,619.20 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-11800H | WSL2 Ubuntu | AVX2 | tiny | 2 | 164.35 | 1087.61 |
i7-11800H | WSL2 Ubuntu | AVX2 | tiny | 4 | 128.94 | 733.24 |
i7-11800H | WSL2 Ubuntu | AVX2 | tiny | 8 | 137.57 | 619.88 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | tiny | 2 | 143.02 | 1087.15 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | tiny | 4 | 127.60 | 730.57 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | tiny | 8 | 125.62 | 616.27 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 BLAS | tiny | 2 | 132.59 | 1511.38 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 BLAS | tiny | 4 | 132.48 | 1407.49 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 BLAS | tiny | 8 | 133.82 | 1458.27 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-11800H | WSL2 Ubuntu | AVX2 | base | 2 | 174.34 | 2533.79 |
i7-11800H | WSL2 Ubuntu | AVX2 | base | 4 | 166.68 | 1830.67 |
i7-11800H | WSL2 Ubuntu | AVX2 | base | 8 | 165.53 | 1478.73 |
i7-11800H | WSL2 Ubuntu | AVX2 | small | 2 | 340.12 | 8714.24 |
i7-11800H | WSL2 Ubuntu | AVX2 | small | 4 | 394.32 | 6021.41 |
i7-11800H | WSL2 Ubuntu | AVX2 | small | 8 | 305.98 | 4828.84 |
i7-11800H | WSL2 Ubuntu | AVX2 | large | 2 | 3205.36 | 57109.10 |
i7-11800H | WSL2 Ubuntu | AVX2 | large | 4 | 2720.25 | 38519.89 |
i7-11800H | WSL2 Ubuntu | AVX2 | large | 8 | 3716.34 | 27739.99 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | large | 2 | 1954.21 | 54966.84 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | large | 4 | 1455.40 | 37320.62 |
i7-11800H | WSL2 Ubuntu | AVX2 AVX512 | large | 8 | 1372.58 | 27937.64 |
This performance is impressing!
M1 Pro | MacOS | | large | 8 | 1973 | 4208
This performance is impressing!
Yes, there is a huge performance boost due to using the built-in BLAS implementation on these devices. I will soon add OpenBLAS support for x86 architectures and see how this compares.
By the way, AVX-512 is not supported on master
. I have added initial support here, but I am not sure if it works: https://github.com/ggerganov/whisper.cpp/pull/95
CPU | OS | Config | Model | Threads | Load[ms] | encode[ms] |
---|---|---|---|---|---|---|
Intel® Core™ i5-8250U | Win11 Home | AVX2 | Large | 8 | 2226.85 | 61547.61 |
compiled with MinGW64 gcc 11.3
Valve Jupiter (AMD Custom APU 0405, Zen 2 microarch, 4c8t, 16GB DDR5 @ 5200 MT/s)
CPU | OS | Config | Model | Threads | Load[ms] | encode[ms] |
---|---|---|---|---|---|---|
AMD Custom APU 0405 | SteamOS 3.2 | AVX2 | Base | 8 | 326.32 | 2592.96 |
Compiled with cc (GCC) 11.3.0
The performance gains on jfk.wav since last test (two weeks or so ago) are extremely impressive, ~10-20x speedup from 40 to 2-4 seconds.
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
MacBook M1 Max | macOS Ventura | BLAS | small | 1 | 299.09 | 4166.00 |
MacBook M1 Max | macOS Ventura | BLAS | small | 4 | 329.45 | 1304.32 |
MacBook M1 Max | macOS Ventura | BLAS | base | 1 | 139.10 | 1302.17 |
MacBook M1 Max | macOS Ventura | BLAS | base | 4 | 135.96 | 399.45 |
On a AMD EPYC 64 core 240 threads cloud instance it is stuck like this with 240 threads. I noticed that above a certain number of threads its slow, or the cloud provider is cpu limiting. Can anyone else with real hardware check if this is the case?
time ./main -m models/ggml-base.en.bin -f elon.wav -t 240
whisper_model_load: loading model from 'models/ggml-base.en.bin'
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem_required = 670.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size = 22.83 MB
whisper_model_load: model size = 140.54 MB
system_info: n_threads = 240 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
main: processing 'elon.wav' (34466688 samples, 2154.2 sec), 240 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ..
So I have tried with the above mentioned cloud provider various number of threads.
I found that anything above 64 threads gets slower and usable upto 120 threads. Anything above is a hang. Must be that the cloud provider is throttling on free trial or too many threads could actually slow down stuff.
...
...
processor : 239
vendor_id : AuthenticAMD
cpu family : 23
model : 49
model name : AMD EPYC 7742 64-Core Processor
stepping : 0
microcode : 0x830104d
cpu MHz : 2245.780
cache size : 512 KB
physical id : 1
siblings : 120
core id : 59
cpu cores : 60
apicid : 247
initial apicid : 247
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip rdpid arch_capabilities
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4491.56
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:
time ./main -m models/ggml-base.en.bin -f elon.wav -t 64
whisper_model_load: loading model from 'models/ggml-base.en.bin'
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem_required = 670.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size = 22.83 MB
whisper_model_load: model size = 140.54 MB
system_info: n_threads = 64 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
main: processing 'elon.wav' (34466688 samples, 2154.2 sec), 64 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:03.960] [MUSIC PLAYING]
[00:00:03.960 --> 00:00:18.240] In life, we've seen within this part of the world
...
...
[00:35:40.320 --> 00:35:41.920] Thank you, and have a great day.
[00:35:41.920 --> 00:35:43.920] [APPLAUSE]
[00:35:43.920 --> 00:35:45.920] [MUSIC PLAYING]
[00:35:45.920 --> 00:35:56.240] [VIDEO PLAYBACK]
whisper_print_timings: load time = 249.61 ms
whisper_print_timings: mel time = 1267.11 ms
whisper_print_timings: sample time = 1718.69 ms
whisper_print_timings: encode time = 63702.25 ms / 10617.04 ms per layer
whisper_print_timings: decode time = 381317.66 ms / 63552.94 ms per layer
whisper_print_timings: total time = 448362.19 ms
real 7m28.411s
user 347m2.230s
sys 22m42.511s
32 threads was faster than 64 threads. I think 32 threads took around 7 minutes or so.
Env: Restricted Cloud / Throttled Maybe
CPU: AMD EPYC 7742 64-Core Processor
OS:
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
Linux XXXX 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Compiler:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
$ ./bench -m ./models/ggml-small.en.bin -t 4
whisper_model_load: loading model from './models/ggml-small.en.bin'
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 3
whisper_model_load: mem_required = 1588.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 464.56 MB
whisper_model_load: memory size = 68.48 MB
whisper_model_load: model size = 464.44 MB
system_info: n_threads = 4 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
whisper_print_timings: load time = 515.02 ms
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms
whisper_print_timings: encode time = 6878.32 ms / 573.19 ms per layer
whisper_print_timings: decode time = 0.00 ms / 0.00 ms per layer
whisper_print_timings: total time = 7393.42 ms
If you wish, you can submit these results here:
https://github.com/ggerganov/whisper.cpp/issues/89
Please include the following information:
- CPU model
- Operating system
- Compiler
$ ./bench -m ./models/ggml-small.en.bin -t 240
whisper_model_load: loading model from './models/ggml-small.en.bin'
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 3
whisper_model_load: mem_required = 1588.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 464.56 MB
whisper_model_load: memory size = 68.48 MB
whisper_model_load: model size = 464.44 MB
system_info: n_threads = 240 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
whisper_print_timings: load time = 528.66 ms
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms
whisper_print_timings: encode time = 12898.34 ms / 1074.86 ms per layer
whisper_print_timings: decode time = 0.00 ms / 0.00 ms per layer
whisper_print_timings: total time = 13427.03 ms
If you wish, you can submit these results here:
https://github.com/ggerganov/whisper.cpp/issues/89
Please include the following information:
- CPU model
- Operating system
- Compiler
I'll remove the above posts if too much clutter.
@trholding Thanks for the results.
You can generate a table with performance results by simply running the extra/bench_all.sh script.
Regarding the threads - yes, it seems that going beyond 8 threads does not help regardless of how many cores you have. My guess is that the computation is memory-bound so that's why using more threads does not improve the performance.
Okay, 8 threads max, so for a large file, is there a possibility of splitting the file to chunks with silences as terminators and dividing the conversion to ((total threads/cores)/8) but also keeping track of timestamps? This could be awesome for batch conversion.
You can generate a table with performance results by simply running the extra/bench_all.sh script.
Oh, I didn't know, I'll update with tables soon and remove my previous comments in a few hours.
You can generate a table with performance results by simply running the extra/bench_all.sh script.
Hey Sorry. That didn't pan out well, I did the benchmark thrice, my account got deleted without notice. Could't get the logs as it was a web terminal. On the other hand I am happy that this happened, I was giving serious thought of purchasing a GPU+CPU plan there, so performance check of CPU was equally important. Probably or technically it was my fault - probably shouldn't have used a reverse shell and done benchmarks on a free trial, but how does one know if a service is real good or all just vapor...
Dell Precision 5560 laptop results:
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-11850H | Ubuntu | AVX2 | tiny | 4 | 115.87 | 538.43 |
i7-11850H | Ubuntu | AVX2 | base | 4 | 145.14 | 1241.84 |
i7-11850H | Ubuntu | AVX2 | small | 4 | 299.30 | 4343.57 |
i7-11850H | Ubuntu | AVX2 | medium | 4 | 760.98 | 15238.31 |
i7-11850H | Ubuntu | AVX2 | large | 4 | 1404.32 | 27476.86 |
i7-11850H | Ubuntu | AVX2 | tiny | 8 | 131.96 | 358.81 |
i7-11850H | Ubuntu | AVX2 | base | 8 | 166.61 | 839.31 |
i7-11850H | Ubuntu | AVX2 | small | 8 | 320.29 | 2854.86 |
i7-11850H | Ubuntu | AVX2 | medium | 8 | 756.20 | 9829.62 |
i7-11850H | Ubuntu | AVX2 | large | 8 | 1382.38 | 19872.81 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i9-9900K | WSL2 Ubuntu (GCC) | AVX2 | tiny.en | 4 | 85.71 | 601.56 |
i9-9900K | WSL2 Ubuntu (GCC) | AVX2 | small.en | 4 | 212.59 | 5146.23 |
i9-9900K | OSX 10.14.1 (hackintosh - GCC) | AVX2 | tiny.en | 4 | 198.17 | 455.12 |
i9-9900K | OSX 10.14.1 (hackintosh - GCC) | AVX2 | base.en | 4 | 272.62 | 909.71 |
i9-9900K | OSX 10.14.1 (hackintosh - GCC) | AVX2 | small.en | 4 | 598.75 | 2968.75 |
Xeon(R) Silver 4210R CPU @ 2.40GHz | Virtual Machine - Debian Stretch (GCC - master branch) | AVX2 avx512f avx512dq avx512cd avx512bw avx512vl | small.en | 4 | 776.56 | 12340.41 |
Xeon(R) Silver 4210R CPU @ 2.40GHz | Virtual Machine - Debian Stretch (GCC - master branch) | AVX2 avx512f avx512dq avx512cd avx512bw avx512vl | tiny.en | 4 | 295.54 | 1710.46 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Tiny | 4 | 124.28 | 656.41 |
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Tiny | 8 | 123.70 | 696.41 |
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Base | 4 | 159.91 | 1754.44 |
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Base | 8 | 164.47 | 1658.55 |
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Small | 4 | 330.91 | 6161.86 |
i9-11950H | Pop!_OS 22.04 LTS | AVX2 | Small | 8 | 346.22 | 5187.85 |
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-1065G7 | Windows 11 | - | small.en | 4 | 1,314.25 | 294,168.09 |
Compiled with VS 2022
Something is off, right?
Yup - you are missing the AVX2
flag. See if some of the comments in https://github.com/ggerganov/whisper.cpp/issues/5 can help you resolve this.
OK, the AVX2
flag seems to help :)
CPU | OS | Config | Model | Threads | Load [ms] | Encode [ms] |
---|---|---|---|---|---|---|
i7-1065G7 | Windows 11 | AVX2 | small.en | 4 | 527.59 | 9,648.67 |
Compiled with VS 2022