Joshua Lochner
                                            Joshua Lochner
                                        
                                    Phi-3 WebGPU support is now working! Demo: https://huggingface.co/spaces/Xenova/experimental-phi3-webgpu https://github.com/xenova/transformers.js/assets/26504141/6c42e61b-f381-4835-bf63-f37cc752a16b
The latest commits add support for [Moondream2](https://huggingface.co/Xenova/moondream2), a small vision language model by @vikhyat designed to run efficiently on edge devices. Try it out yourself with the live demo: https://huggingface.co/spaces/Xenova/experimental-moondream-webgpu...
VLMs now support PKV caching. Demo: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu https://github.com/xenova/transformers.js/assets/26504141/b8b10e8b-22c7-4942-a846-05bfa472a4e7 Example code ```js import { AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration, RawImage } from '@xenova/transformers'; // Load tokenizer, processor and model const model_id = 'Xenova/nanoLLaVA';...
@beaufortfrancois That would be amazing to have! Although, it's probably best suited as a feature request for onnxruntime-web. The way one could do it is to use the external data...
Apologies for the late reply @faizulhaque - I somehow didn't see this until @flatsiedatsie's comment. Could you provide the code you are using? The original repo has the `model_max_length` correctly...
@faizulhaque Updated :) But you don't need to use that version anymore. You can use the official model repo
Hi there 👋 Thanks for making the repro 👍 I believe the problem is the following: 1. By default, transformers.js (like the python library) checks your local server for the...
Real-time transcription will hopefully be possible once webgpu support is added, and we'll definitely revisit (and update the demo) once it is. If someone in the community would like to...
The major bottleneck at the moment is the encoder, which can take a few seconds to process ~30 seconds. Ideally, if we were to process shorter audio sequences, it would...
Thanks for the resources :) For the most part, we are waiting for onnxruntime-web to add webgpu as a supported backend. Here is the associated PR to track its progress:...