whisper.cpp
whisper.cpp copied to clipboard
[Feature request] WASM WebGPU
It's clear that leveraging a GPU makes processing faster, and I believe in principle WebGPU is available in SIMD. Is it even feasible to integrate with the GPU where available in Chrome etc?
I'm not familiar with the WebGPU API. If you demonstrate a basic matrix multiplication example using WebGPU, and it does not look too complicated, I might give it a try.
I have some experience with WebGPU and might have a look at this. Note that WebGPU would allow GPU-based computation without depending on any vendor specific libraries like CUDA not only for the web but also natively (with Vulkan, DX12 or Metal), by using dawn or wgpu.
This can be helpful https://github.com/juj/wasm_webgpu
@niklaskorz any chance that you would look at this? That would give even a further kick to this project, (or did I miss anything relevant and it's been solved?)
I started looking into it -- its very easy to link wasm_webgpu into emscripten, then in principle you should be implement the matrix multiplication example from https://github.com/milhidaka/webgpu-blas -- I have done this -- but I am running to an issue with my shader. I am really curious if WebGPU will give us real-time streaming performance.
On a similar topic, recently I found this project: https://github.com/xenova/transformers.js
It has a very efficient inference of Whisper tiny using WASM. They seem to be using something called ONNX Runtime. Although adapting to such a framework is out of scope for whisper.cpp
, it seems like there is still a lot to gain in the existing WASM implementation. Even without using WASM SIMD, it seems to be possible to achieve much higher performance.
I wonder if there is something that could be done in ggml
to speed up the WASM processing. Even if we don't reach ONNX Runtime performance level, it would still be very nice to improve the existing speed.
Regarding WebGPU: would be great if someone provides a PoC. Transformers.js announced they will support WebGPU soon too, so it should be possible.
Edit: Btw, is there something like WASM BLAS ?