whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Benchmark results

Open ggerganov opened this issue 1 year ago • 158 comments

Encoder

Collection of bench results for various platforms and devices. If you want to submit info about your device, simply run the bench tool or the extra/bench-all.sh and report the results in the comments below.

Suggestions for better summary of the results are welcome

CPU OS Config Model Th Load Enc. Commit
MacBook M1 Pro MacOS 13.0.1 NEON BLAS tiny 8 71 102 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS base 8 96 220 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS small 8 233 685 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS medium 8 603 1928 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS large 8 1158 3350 206fc93
---
MacBook M1 Pro MacOS 13.0.1 NEON BLAS small 1 251 2605 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS small 4 255 884 206fc93
---
Mac Mini M1 MacOS NEON BLAS tiny 4 62 194 fcf515d
Mac Mini M1 MacOS NEON BLAS base 4 81 380 fcf515d
Mac Mini M1 MacOS NEON BLAS small 4 204 1249 fcf515d
Mac Mini M1 MacOS NEON BLAS medium 4 876 3980 fcf515d
Mac Mini M1 MacOS NEON BLAS large 4 1876 7979 fcf515d
---
Ryzen 9 3900X Ubuntu 20.04 AVX2 tiny 8 107 422 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 base 8 137 880 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 small 8 280 2874 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 medium 8 692 9610 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 large 8 1317 16917 fcf515d
---
Ryzen 9 3900X Ubuntu 20.04 AVX2 BLAS tiny 4 120 780 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 BLAS base 4 151 1173 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 BLAS small 4 289 3062 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 BLAS medium 4 711 9175 fcf515d
Ryzen 9 3900X Ubuntu 20.04 AVX2 BLAS large 4 1282 16050 fcf515d
---
Ryzen 9 5950X Ubuntu 22.04 AVX2 tiny 8 135 197 fcf515d
Ryzen 9 5950X Ubuntu 22.04 AVX2 base 8 176 421 fcf515d
Ryzen 9 5950X Ubuntu 22.04 AVX2 small 8 357 1393 fcf515d
Ryzen 9 5950X Ubuntu 22.04 AVX2 medium 8 855 4404 fcf515d
Ryzen 9 5950X Ubuntu 22.04 AVX2 large 8 1576 8118 fcf515d
---
Raspberry Pi 4 NEON tiny 4 1436 13839 fcf515d
Raspberry Pi 4 NEON base 4 1894 30552 fcf515d
---
iPhone 13 Mini iOS 16.0 NEON BLAS base 4 97 1091 fcf515d
---
MacBook M1 Pro Vivaldi WASM tiny 8 133 3785 fcf515d
MacBook M1 Pro Vivaldi WASM base 8 172 8253 fcf515d
---
MacBook M1 Pro Chrome WASM tiny 8 134 3776 fcf515d
MacBook M1 Pro Chrome WASM base 8 168 8200 fcf515d
---
MacBook M1 Pro Firefox WASM tiny 8 137 2626 fcf515d
MacBook M1 Pro Firefox WASM base 8 183 6226 fcf515d

memcpy

MacBook M1 Pro

./bench -w 1 -t 1
memcpy: 37.59 GB/s

Ryzen 9 5950X

./bench -w 1 -t 1
memcpy: 16.74 GB/s

ggml_mul_mat

MacBook M1 Pro

./bench -w 2 -t 1
ggml_mul_mat:    64 x    64: F16    330.6 GFLOPS (128 runs) / F32    466.0 GFLOPS (128 runs)
ggml_mul_mat:   128 x   128: F16    737.5 GFLOPS (128 runs) / F32    838.9 GFLOPS (128 runs)
ggml_mul_mat:   256 x   256: F16    938.6 GFLOPS (128 runs) / F32   1062.3 GFLOPS (128 runs)
ggml_mul_mat:   512 x   512: F16   1312.5 GFLOPS (128 runs) / F32   1835.5 GFLOPS (128 runs)
ggml_mul_mat:  1024 x  1024: F16   1765.1 GFLOPS (128 runs) / F32   2041.4 GFLOPS (128 runs)
ggml_mul_mat:  2048 x  2048: F16   1784.3 GFLOPS (104 runs) / F32   1859.2 GFLOPS (109 runs)
ggml_mul_mat:  4096 x  4096: F16   1855.1 GFLOPS ( 14 runs) / F32   1873.3 GFLOPS ( 14 runs)

Ryzen 9 5950X

WHISPER_OPENBLAS=1 make -j bench && ./bench -w 2 -t 1
ggml_mul_mat:    64 x    64: F16     56.3 GFLOPS (128 runs) / F32     70.2 GFLOPS (128 runs)
ggml_mul_mat:   128 x   128: F16     47.8 GFLOPS (128 runs) / F32     67.0 GFLOPS (128 runs)
ggml_mul_mat:   256 x   256: F16    185.1 GFLOPS (128 runs) / F32    332.7 GFLOPS (128 runs)
ggml_mul_mat:   512 x   512: F16    386.4 GFLOPS (128 runs) / F32    658.6 GFLOPS (128 runs)
ggml_mul_mat:  1024 x  1024: F16    636.2 GFLOPS (128 runs) / F32   1012.0 GFLOPS (128 runs)
ggml_mul_mat:  2048 x  2048: F16    950.9 GFLOPS ( 56 runs) / F32   1296.8 GFLOPS ( 76 runs)
ggml_mul_mat:  4096 x  4096: F16   1168.6 GFLOPS (  9 runs) / F32   1403.1 GFLOPS ( 11 runs)

ggerganov avatar Oct 25 '22 17:10 ggerganov

Results for Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-4790K Debian   tiny.en 4 165 808
i7-4790K Debian   tiny.en 8 165 783
i7-4790K Debian   base.en 4 212 1813
i7-4790K Debian   base.en 8 214 1746

cdosoftei avatar Oct 25 '22 18:10 cdosoftei

Results for Ryzen 5 4500U 6C/6T laptop CPU (I've just included one result for 8 threads as Encode time is much higher when threads > CPU cores).

CPU OS Config Model Threads Load [ms] Encode [ms]
Ryzen 5 4500U (6C/6T) Opensuse Leap tiny.en 4 170.00 829.43
Ryzen 5 4500U (6C/6T) Opensuse Leap tiny.en 6 143.03 671.74
Ryzen 5 4500U (6C/6T) Opensuse Leap base.en 4 305.92 2,092.39
Ryzen 5 4500U (6C/6T) Opensuse Leap base.en 6 188.05 1,495.61
Ryzen 5 4500U (6C/6T) Opensuse Leap small.en 4 408.03 6,919.31
Ryzen 5 4500U (6C/6T) Opensuse Leap small.en 6 359.23 6,370.83
Ryzen 5 4500U (6C/6T) Opensuse Leap medium.en 4 2,238.11 25,863.28
Ryzen 5 4500U (6C/6T) Opensuse Leap medium.en 6 1,113.04 19,672.63
Ryzen 5 4500U (6C/6T) Opensuse Leap medium.en 8 973.65 39,619.20

rjwilmsi avatar Oct 26 '22 12:10 rjwilmsi

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-11800H WSL2 Ubuntu AVX2 tiny 2 164.35 1087.61
i7-11800H WSL2 Ubuntu AVX2 tiny 4 128.94 733.24
i7-11800H WSL2 Ubuntu AVX2 tiny 8 137.57 619.88
i7-11800H WSL2 Ubuntu AVX2 AVX512 tiny 2 143.02 1087.15
i7-11800H WSL2 Ubuntu AVX2 AVX512 tiny 4 127.60 730.57
i7-11800H WSL2 Ubuntu AVX2 AVX512 tiny 8 125.62 616.27
i7-11800H WSL2 Ubuntu AVX2 AVX512 BLAS tiny 2 132.59 1511.38
i7-11800H WSL2 Ubuntu AVX2 AVX512 BLAS tiny 4 132.48 1407.49
i7-11800H WSL2 Ubuntu AVX2 AVX512 BLAS tiny 8 133.82 1458.27

ArtyomZemlyak avatar Oct 26 '22 15:10 ArtyomZemlyak

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-11800H WSL2 Ubuntu AVX2 base 2 174.34 2533.79
i7-11800H WSL2 Ubuntu AVX2 base 4 166.68 1830.67
i7-11800H WSL2 Ubuntu AVX2 base 8 165.53 1478.73
i7-11800H WSL2 Ubuntu AVX2 small 2 340.12 8714.24
i7-11800H WSL2 Ubuntu AVX2 small 4 394.32 6021.41
i7-11800H WSL2 Ubuntu AVX2 small 8 305.98 4828.84
i7-11800H WSL2 Ubuntu AVX2 large 2 3205.36 57109.10
i7-11800H WSL2 Ubuntu AVX2 large 4 2720.25 38519.89
i7-11800H WSL2 Ubuntu AVX2 large 8 3716.34 27739.99

ArtyomZemlyak avatar Oct 26 '22 15:10 ArtyomZemlyak

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-11800H WSL2 Ubuntu AVX2 AVX512 large 2 1954.21 54966.84
i7-11800H WSL2 Ubuntu AVX2 AVX512 large 4 1455.40 37320.62
i7-11800H WSL2 Ubuntu AVX2 AVX512 large 8 1372.58 27937.64

ArtyomZemlyak avatar Oct 26 '22 15:10 ArtyomZemlyak

This performance is impressing!

M1 Pro | MacOS |   | large | 8 | 1973 | 4208

ArtyomZemlyak avatar Oct 26 '22 15:10 ArtyomZemlyak

This performance is impressing!

Yes, there is a huge performance boost due to using the built-in BLAS implementation on these devices. I will soon add OpenBLAS support for x86 architectures and see how this compares.

By the way, AVX-512 is not supported on master. I have added initial support here, but I am not sure if it works: https://github.com/ggerganov/whisper.cpp/pull/95

ggerganov avatar Oct 26 '22 19:10 ggerganov

CPU OS Config Model Threads Load[ms] encode[ms]
Intel® Core™ i5-8250U Win11 Home AVX2 Large 8 2226.85 61547.61

compiled with MinGW64 gcc 11.3

cristianglezm avatar Oct 28 '22 20:10 cristianglezm

Valve Jupiter (AMD Custom APU 0405, Zen 2 microarch, 4c8t, 16GB DDR5 @ 5200 MT/s)

CPU OS Config Model Threads Load[ms] encode[ms]
AMD Custom APU 0405 SteamOS 3.2 AVX2 Base 8 326.32 2592.96

Compiled with cc (GCC) 11.3.0

The performance gains on jfk.wav since last test (two weeks or so ago) are extremely impressive, ~10-20x speedup from 40 to 2-4 seconds.

tazz4843 avatar Oct 29 '22 00:10 tazz4843

CPU OS Config Model Threads Load [ms] Encode [ms]
MacBook M1 Max macOS Ventura BLAS small 1 299.09 4166.00
MacBook M1 Max macOS Ventura BLAS small 4 329.45 1304.32
MacBook M1 Max macOS Ventura BLAS base 1 139.10 1302.17
MacBook M1 Max macOS Ventura BLAS base 4 135.96 399.45

yujinqiu avatar Oct 30 '22 00:10 yujinqiu

On a AMD EPYC 64 core 240 threads cloud instance it is stuck like this with 240 threads. I noticed that above a certain number of threads its slow, or the cloud provider is cpu limiting. Can anyone else with real hardware check if this is the case?

time ./main -m models/ggml-base.en.bin -f elon.wav -t 240
whisper_model_load: loading model from 'models/ggml-base.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem_required  = 670.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size =    22.83 MB
whisper_model_load: model size  =   140.54 MB

system_info: n_threads = 240 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

main: processing 'elon.wav' (34466688 samples, 2154.2 sec), 240 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ..

trholding avatar Oct 31 '22 12:10 trholding

So I have tried with the above mentioned cloud provider various number of threads.

I found that anything above 64 threads gets slower and usable upto 120 threads. Anything above is a hang. Must be that the cloud provider is throttling on free trial or too many threads could actually slow down stuff.

...
...
processor       : 239
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7742 64-Core Processor
stepping        : 0
microcode       : 0x830104d
cpu MHz         : 2245.780
cache size      : 512 KB
physical id     : 1
siblings        : 120
core id         : 59
cpu cores       : 60
apicid          : 247
initial apicid  : 247
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip rdpid arch_capabilities
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 4491.56
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:
time ./main -m models/ggml-base.en.bin -f elon.wav -t 64
whisper_model_load: loading model from 'models/ggml-base.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem_required  = 670.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size =    22.83 MB
whisper_model_load: model size  =   140.54 MB

system_info: n_threads = 64 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

main: processing 'elon.wav' (34466688 samples, 2154.2 sec), 64 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.960]   [MUSIC PLAYING]
[00:00:03.960 --> 00:00:18.240]   In life, we've seen within this part of the world
...
...
[00:35:40.320 --> 00:35:41.920]   Thank you, and have a great day.
[00:35:41.920 --> 00:35:43.920]   [APPLAUSE]
[00:35:43.920 --> 00:35:45.920]   [MUSIC PLAYING]
[00:35:45.920 --> 00:35:56.240]   [VIDEO PLAYBACK]


whisper_print_timings:     load time =   249.61 ms
whisper_print_timings:      mel time =  1267.11 ms
whisper_print_timings:   sample time =  1718.69 ms
whisper_print_timings:   encode time = 63702.25 ms / 10617.04 ms per layer
whisper_print_timings:   decode time = 381317.66 ms / 63552.94 ms per layer
whisper_print_timings:    total time = 448362.19 ms

real    7m28.411s
user    347m2.230s
sys     22m42.511s

32 threads was faster than 64 threads. I think 32 threads took around 7 minutes or so.

trholding avatar Oct 31 '22 12:10 trholding

Env: Restricted Cloud / Throttled Maybe

CPU: AMD EPYC 7742 64-Core Processor

OS:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal
Linux XXXX 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Compiler:

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) 
$ ./bench -m ./models/ggml-small.en.bin -t 4
whisper_model_load: loading model from './models/ggml-small.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: mem_required  = 1588.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 464.56 MB
whisper_model_load: memory size =    68.48 MB
whisper_model_load: model size  =   464.44 MB

system_info: n_threads = 4 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

whisper_print_timings:     load time =   515.02 ms
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms
whisper_print_timings:   encode time =  6878.32 ms / 573.19 ms per layer
whisper_print_timings:   decode time =     0.00 ms / 0.00 ms per layer
whisper_print_timings:    total time =  7393.42 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler
$ ./bench -m ./models/ggml-small.en.bin -t 240
whisper_model_load: loading model from './models/ggml-small.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: mem_required  = 1588.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 464.56 MB
whisper_model_load: memory size =    68.48 MB
whisper_model_load: model size  =   464.44 MB

system_info: n_threads = 240 / 240 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

whisper_print_timings:     load time =   528.66 ms
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms
whisper_print_timings:   encode time = 12898.34 ms / 1074.86 ms per layer
whisper_print_timings:   decode time =     0.00 ms / 0.00 ms per layer
whisper_print_timings:    total time = 13427.03 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler

I'll remove the above posts if too much clutter.

trholding avatar Oct 31 '22 13:10 trholding

@trholding Thanks for the results.

You can generate a table with performance results by simply running the extra/bench_all.sh script.

Regarding the threads - yes, it seems that going beyond 8 threads does not help regardless of how many cores you have. My guess is that the computation is memory-bound so that's why using more threads does not improve the performance.

ggerganov avatar Oct 31 '22 17:10 ggerganov

Okay, 8 threads max, so for a large file, is there a possibility of splitting the file to chunks with silences as terminators and dividing the conversion to ((total threads/cores)/8) but also keeping track of timestamps? This could be awesome for batch conversion.

You can generate a table with performance results by simply running the extra/bench_all.sh script.

Oh, I didn't know, I'll update with tables soon and remove my previous comments in a few hours.

trholding avatar Oct 31 '22 18:10 trholding

You can generate a table with performance results by simply running the extra/bench_all.sh script.

Hey Sorry. That didn't pan out well, I did the benchmark thrice, my account got deleted without notice. Could't get the logs as it was a web terminal. On the other hand I am happy that this happened, I was giving serious thought of purchasing a GPU+CPU plan there, so performance check of CPU was equally important. Probably or technically it was my fault - probably shouldn't have used a reverse shell and done benchmarks on a free trial, but how does one know if a service is real good or all just vapor...

trholding avatar Oct 31 '22 22:10 trholding

Dell Precision 5560 laptop results:

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-11850H Ubuntu AVX2 tiny 4 115.87 538.43
i7-11850H Ubuntu AVX2 base 4 145.14 1241.84
i7-11850H Ubuntu AVX2 small 4 299.30 4343.57
i7-11850H Ubuntu AVX2 medium 4 760.98 15238.31
i7-11850H Ubuntu AVX2 large 4 1404.32 27476.86
i7-11850H Ubuntu AVX2 tiny 8 131.96 358.81
i7-11850H Ubuntu AVX2 base 8 166.61 839.31
i7-11850H Ubuntu AVX2 small 8 320.29 2854.86
i7-11850H Ubuntu AVX2 medium 8 756.20 9829.62
i7-11850H Ubuntu AVX2 large 8 1382.38 19872.81

rgerganov avatar Nov 05 '22 06:11 rgerganov

CPU OS Config Model Threads Load [ms] Encode [ms]
i9-9900K WSL2 Ubuntu (GCC) AVX2  tiny.en 4 85.71 601.56
i9-9900K WSL2 Ubuntu (GCC) AVX2  small.en 4 212.59 5146.23
i9-9900K OSX 10.14.1 (hackintosh - GCC) AVX2  tiny.en 4 198.17 455.12
i9-9900K OSX 10.14.1 (hackintosh - GCC) AVX2  base.en 4 272.62 909.71
i9-9900K OSX 10.14.1 (hackintosh - GCC) AVX2 small.en 4 598.75 2968.75
Xeon(R) Silver 4210R CPU @ 2.40GHz Virtual Machine - Debian Stretch (GCC - master branch) AVX2 avx512f avx512dq avx512cd avx512bw avx512vl small.en 4 776.56 12340.41
Xeon(R) Silver 4210R CPU @ 2.40GHz Virtual Machine - Debian Stretch (GCC - master branch) AVX2 avx512f avx512dq avx512cd avx512bw avx512vl tiny.en 4 295.54 1710.46

jaybinks avatar Nov 05 '22 10:11 jaybinks

CPU OS Config Model Threads Load [ms] Encode [ms]
i9-11950H Pop!_OS 22.04 LTS AVX2 Tiny 4 124.28 656.41
i9-11950H Pop!_OS 22.04 LTS AVX2 Tiny 8 123.70 696.41
i9-11950H Pop!_OS 22.04 LTS AVX2 Base 4 159.91 1754.44
i9-11950H Pop!_OS 22.04 LTS AVX2 Base 8 164.47 1658.55
i9-11950H Pop!_OS 22.04 LTS AVX2 Small 4 330.91 6161.86
i9-11950H Pop!_OS 22.04 LTS AVX2 Small 8 346.22 5187.85

mark-beeby avatar Nov 08 '22 09:11 mark-beeby

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-1065G7 Windows 11 - small.en 4 1,314.25 294,168.09

Compiled with VS 2022

Something is off, right?

niksedk avatar Nov 09 '22 19:11 niksedk

Yup - you are missing the AVX2 flag. See if some of the comments in https://github.com/ggerganov/whisper.cpp/issues/5 can help you resolve this.

ggerganov avatar Nov 09 '22 20:11 ggerganov

OK, the AVX2 flag seems to help :)

CPU OS Config Model Threads Load [ms] Encode [ms]
i7-1065G7 Windows 11 AVX2 small.en 4 527.59 9,648.67

Compiled with VS 2022

niksedk avatar Nov 09 '22 20:11 niksedk