whisper.cpp [Feature request] WASM WebGPU

[Feature request] WASM WebGPU

Open mark-beeby opened this issue 1 year ago • 6 comments

It's clear that leveraging a GPU makes processing faster, and I believe in principle WebGPU is available in SIMD. Is it even feasible to integrate with the GPU where available in Chrome etc?

Oct 27 '22 11:10 mark-beeby

I'm not familiar with the WebGPU API. If you demonstrate a basic matrix multiplication example using WebGPU, and it does not look too complicated, I might give it a try.

Nov 01 '22 21:11 ggerganov

I have some experience with WebGPU and might have a look at this. Note that WebGPU would allow GPU-based computation without depending on any vendor specific libraries like CUDA not only for the web but also natively (with Vulkan, DX12 or Metal), by using dawn or wgpu.

Dec 09 '22 10:12 niklaskorz

This can be helpful https://github.com/juj/wasm_webgpu

Dec 24 '22 20:12 gut4

@niklaskorz any chance that you would look at this? That would give even a further kick to this project, (or did I miss anything relevant and it's been solved?)

Mar 20 '23 13:03 sandorkonya

I started looking into it -- its very easy to link wasm_webgpu into emscripten, then in principle you should be implement the matrix multiplication example from https://github.com/milhidaka/webgpu-blas -- I have done this -- but I am running to an issue with my shader. I am really curious if WebGPU will give us real-time streaming performance.

Mar 20 '23 13:03 patrickinminneapolis

On a similar topic, recently I found this project: https://github.com/xenova/transformers.js

It has a very efficient inference of Whisper tiny using WASM. They seem to be using something called ONNX Runtime. Although adapting to such a framework is out of scope for whisper.cpp, it seems like there is still a lot to gain in the existing WASM implementation. Even without using WASM SIMD, it seems to be possible to achieve much higher performance.

I wonder if there is something that could be done in ggml to speed up the WASM processing. Even if we don't reach ONNX Runtime performance level, it would still be very nice to improve the existing speed.

Regarding WebGPU: would be great if someone provides a PoC. Transformers.js announced they will support WebGPU soon too, so it should be possible.

Edit: Btw, is there something like WASM BLAS ?

Mar 20 '23 14:03 ggerganov

whisper.cpp whisper.cpp copied to clipboard

[Feature request] WASM WebGPU

whisper.cpp
whisper.cpp copied to clipboard