web-llm
web-llm copied to clipboard
High-performance In-browser LLM Inference Engine
Hello I have an Intel and Nvidia card, so I rebuilt the tvm bundle to have the "high-performance" change. I noticed that when the model starts to write "weird things",...
I guess it would be easy for you run the ggml [llama.cpp](https://github.com/ggerganov/llama.cpp) compatible models. In this case, you don't need the GPU and could run the models in memory. From...
Where is the source code for vicuna-7b_webgpu.wasm, please? Thank you.
Has anyone tried to combine all 163 shards into one file? If yes, was it a difference in performance? Thank you.
Hey folks, amazing work. However, FYI this does not work on Microsoft Edge (running on Linux Fedora 37, with an Nvidia 1080), which is a shame.  Using `edge://gpu` I...
Adding a dropdown menu to the platform will allow users to easily select the LLM they want to use along with a brief description of its features. This will improve...
Vicuna v0's vocab_size is 32001, but v1's vocab size is 32000. So we need to update the manual schedule.
I am not 100% sure, but 97% sure :-) that running the Web-LLM with 3-5 questions caused data transfer in the order of 5-6 GB. Here is the runtime environment:...