whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

[Feature request] WASM WebGPU

Open mark-beeby opened this issue 1 year ago • 6 comments

It's clear that leveraging a GPU makes processing faster, and I believe in principle WebGPU is available in SIMD. Is it even feasible to integrate with the GPU where available in Chrome etc?

mark-beeby avatar Oct 27 '22 11:10 mark-beeby

I'm not familiar with the WebGPU API. If you demonstrate a basic matrix multiplication example using WebGPU, and it does not look too complicated, I might give it a try.

ggerganov avatar Nov 01 '22 21:11 ggerganov

I have some experience with WebGPU and might have a look at this. Note that WebGPU would allow GPU-based computation without depending on any vendor specific libraries like CUDA not only for the web but also natively (with Vulkan, DX12 or Metal), by using dawn or wgpu.

niklaskorz avatar Dec 09 '22 10:12 niklaskorz

This can be helpful https://github.com/juj/wasm_webgpu

gut4 avatar Dec 24 '22 20:12 gut4

@niklaskorz any chance that you would look at this? That would give even a further kick to this project, (or did I miss anything relevant and it's been solved?)

sandorkonya avatar Mar 20 '23 13:03 sandorkonya

I started looking into it -- its very easy to link wasm_webgpu into emscripten, then in principle you should be implement the matrix multiplication example from https://github.com/milhidaka/webgpu-blas -- I have done this -- but I am running to an issue with my shader. I am really curious if WebGPU will give us real-time streaming performance.

patrickinminneapolis avatar Mar 20 '23 13:03 patrickinminneapolis

On a similar topic, recently I found this project: https://github.com/xenova/transformers.js

It has a very efficient inference of Whisper tiny using WASM. They seem to be using something called ONNX Runtime. Although adapting to such a framework is out of scope for whisper.cpp, it seems like there is still a lot to gain in the existing WASM implementation. Even without using WASM SIMD, it seems to be possible to achieve much higher performance.

I wonder if there is something that could be done in ggml to speed up the WASM processing. Even if we don't reach ONNX Runtime performance level, it would still be very nice to improve the existing speed.

Regarding WebGPU: would be great if someone provides a PoC. Transformers.js announced they will support WebGPU soon too, so it should be possible.

Edit: Btw, is there something like WASM BLAS ?

ggerganov avatar Mar 20 '23 14:03 ggerganov