whisper.cpp whisper.wasm n_threads = 4 / 4 on ggerganov.com, but 8 / 8 and runs an order of magnitude slower on my server

whisper.wasm n_threads = 4 / 4 on ggerganov.com, but 8 / 8 and runs an order of magnitude slower on my server

Open webbp opened this issue 1 year ago • 1 comments

At first I assumed this was caused by me building the latest commit instead of the same commit used for the whisper.ggeranov.com/ demo, different emscripten version or flags. Nope: I directly copied the built files from whisper.ggerganov.com:

for f in helpers.js index.html libmain.worker.js libwhisper.worker.js main.js; do
  wget https://whisper.ggerganov.com/$f
done

And confirmed this:

whisper.ggeranov.com system_info: n_threads = 4 / 4
          my server: system_info: n_threads = 8 / 8

And, crucially, encode & decode are much much slower:

[00:00:00.000 --> 00:00:02.000]   Testing 1, 2, 3.

whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:     load time =   225.00 ms
whisper_print_timings:      mel time =   359.00 ms
whisper_print_timings:   sample time =   176.00 ms /     9 runs (   19.56 ms per run)
whisper_print_timings:   encode time = 15066.00 ms /     1 runs (15066.00 ms per run)
whisper_print_timings:   decode time = 161336.00 ms /     9 runs (17926.22 ms per run)
whisper_print_timings:    total time = 177288.00 ms

I also tried modifying my server to send all the same response headers as ggeranov.com. Unsurprisingly, that had no effect. Any idea what could cause such a thing?

E.g. full system_infos:

whisper.ggerganov.com
system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 
operator(): processing 176000 samples, 11.0 sec, 4 threads, 1 processors, lang = en, task = transcribe ...

my server
system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 
operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...

Only difference is n_threads = 4 / 4 vs 8 / 8.

Feb 20 '23 13:02 webbp

Not sure I follow - the number of available threads is determined by the client (i.e. the browser) that opens the page - regardless where the page is hosted. If you are on the same computer and on the same browser, the number of detected threads should be the same on all servers.

Feb 27 '23 18:02 ggerganov

whisper.cpp whisper.cpp copied to clipboard

whisper.wasm n_threads = 4 / 4 on ggerganov.com, but 8 / 8 and runs an order of magnitude slower on my server

whisper.cpp
whisper.cpp copied to clipboard