web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

High-performance In-browser LLM Inference Engine

Results 295 web-llm issues
Sort by recently updated
recently updated
newest added

Hello I have an Intel and Nvidia card, so I rebuilt the tvm bundle to have the "high-performance" change. I noticed that when the model starts to write "weird things",...

I guess it would be easy for you run the ggml [llama.cpp](https://github.com/ggerganov/llama.cpp) compatible models. In this case, you don't need the GPU and could run the models in memory. From...

Where is the source code for vicuna-7b_webgpu.wasm, please? Thank you.

Has anyone tried to combine all 163 shards into one file? If yes, was it a difference in performance? Thank you.

Hey folks, amazing work. However, FYI this does not work on Microsoft Edge (running on Linux Fedora 37, with an Nvidia 1080), which is a shame. ![ksnip_20230418-134843](https://user-images.githubusercontent.com/518555/232768299-81de4a9d-9c91-450d-b635-0b157d309eea.png) Using `edge://gpu` I...

Adding a dropdown menu to the platform will allow users to easily select the LLM they want to use along with a brief description of its features. This will improve...

Vicuna v0's vocab_size is 32001, but v1's vocab size is 32000. So we need to update the manual schedule.

I am not 100% sure, but 97% sure :-) that running the Web-LLM with 3-5 questions caused data transfer in the order of 5-6 GB. Here is the runtime environment:...