whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Decoding strangely slow on i7 Macbook Pro

Open cuuupid opened this issue 1 year ago • 1 comments

Hey there! This is an awesome project, I'm trying to build a web app using this.

Unfortunately running into a weird issue. When transcribing 2-3 seconds of audio, M1/M2 Macs take <2s, i7 PCs take <2s, but for some strange reason i7 Macs take 40+s. I thought this was strange so I looked at #89 and found the results were pretty decent for loading and encoding, so I took a deeper look and found the issue is actually the decode speed. Load + encode in WASM is <2s for i7 Macs, but for some reason decode is hitting 40+s consistently.

Not sure what the issue is, spec-wise I'm testing with an i7 Mac @ 2.8ghz with 6 cores, 16gb ram. I saw some other issues where it was noted using AVX would add a huge speedup and in the current WASM example, I see:

system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 |

So it seems AVX is off here, is it possible to turn this on for WASM? I'm also consistently having 40+s transcribe time on M1/M2 Macs with trying to run whisper_full_parallel with anything more than n_processors = 1 via WASM and wondering if this is a similar issue.

cuuupid avatar Jan 06 '23 09:01 cuuupid

WASM performance is not great overall. AVX and AVX2 and all other stuff is not available for WASM. #368 just landed on master - it should give about 20-30% speedup for the WASM build.

I haven't tested n_processors (P) with WASM yet. When increasing it, it is very important to reduce the number of threads T. Ideally, you want P*T == 8

ggerganov avatar Jan 06 '23 17:01 ggerganov