whisper.cpp
whisper.cpp copied to clipboard
whisper.wasm n_threads = 4 / 4 on ggerganov.com, but 8 / 8 and runs an order of magnitude slower on my server
At first I assumed this was caused by me building the latest commit instead of the same commit used for the whisper.ggeranov.com/ demo, different emscripten version or flags. Nope: I directly copied the built files from whisper.ggerganov.com:
for f in helpers.js index.html libmain.worker.js libwhisper.worker.js main.js; do
wget https://whisper.ggerganov.com/$f
done
And confirmed this:
whisper.ggeranov.com system_info: n_threads = 4 / 4
my server: system_info: n_threads = 8 / 8
And, crucially, encode & decode are much much slower:
[00:00:00.000 --> 00:00:02.000] Testing 1, 2, 3.
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 225.00 ms
whisper_print_timings: mel time = 359.00 ms
whisper_print_timings: sample time = 176.00 ms / 9 runs ( 19.56 ms per run)
whisper_print_timings: encode time = 15066.00 ms / 1 runs (15066.00 ms per run)
whisper_print_timings: decode time = 161336.00 ms / 9 runs (17926.22 ms per run)
whisper_print_timings: total time = 177288.00 ms
I also tried modifying my server to send all the same response headers as ggeranov.com. Unsurprisingly, that had no effect. Any idea what could cause such a thing?
E.g. full system_infos:
whisper.ggerganov.com
system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
operator(): processing 176000 samples, 11.0 sec, 4 threads, 1 processors, lang = en, task = transcribe ...
my server
system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...
Only difference is n_threads = 4 / 4 vs 8 / 8.
Not sure I follow - the number of available threads is determined by the client (i.e. the browser) that opens the page - regardless where the page is hosted. If you are on the same computer and on the same browser, the number of detected threads should be the same on all servers.