Ryan Hileman
Ryan Hileman
I also optimized the log_mel matmul a bit, here are new numbers. The log_mel step now seems about 10x faster than the original. 1-thread log_mel is now 40% faster than...
> @lunixbochs It's usually better to put licenses of subprojects used into separate license files such as `LICENSE.pocketfft`, so the original file still stays readable. I force pushed with a...
Here's my machine with a much longer test (1h51m) main -t 4 ``` whisper_print_timings: fallbacks = 9 p / 24 h whisper_print_timings: load time = 64.84 ms whisper_print_timings: mel time...
I think this is a mistake, unless you're planning to depend on a BLAS (which IMO is a much more complicated and heavy thing for users to manage, and isn't...
I've benchmarked all of the modern ffts, you probably want pocketfft for this. Looking at porting to that now, a naive port brings the mel time from around 10ms to...
PR for pocketfft here: #583
After my pocketfft PR, something like 75% of the log_mel computation is spent doing a matrix vector multiply here: https://github.com/ggerganov/whisper.cpp/blob/d1f16463fa8182d9436aa30287ad320492943f56/whisper.cpp#L2285-L2294 You could use Accelerate for that, but I also assume...
The easiest solution here would be me trying the game myself. I'll take a look.
Are you on the unstable branch? Try both master and unstable. ssvb's glshim is _very_ old