web-llm
web-llm copied to clipboard
High-performance In-browser LLM Inference Engine
Add gem mirror doc for gem/web beginner like me..
We should also integrate an option for distributed computing, like folding@home, dedicating a certain percentage of cpu/gpu power for a decentralized service, as decentralized supercomputing would significantly aid with computationally...
 
I'm stuck on the following error: ``` $ python3 build.py --target webgpu --debug-dump Load cached module from dist/vicuna-7b-v1/mod_cache_before_build.pkl and skip tracing. You can use --use-cache=0 to retrace Dump mod to...
Can anyone point to where to find the information how? thanks
Model Version: [vicuna-7b-v1.1] GPU: GTX 1060 6G Mobile, System: Ubuntu 22.04 LTS, browser: Chrome 114.0.5720.4 dev | Edge 114.0.1793.0, Laptop: Alienware 13 R3 16G i7-7700HQ CPU @ 2.80GHz × 8...
My main drive has very little space left, and it seems to default to downloading there. Is it possible to have it install to a separate drive? I'm using Edge...
I noticed that even when all shards are cached `ndarray-cache.json` still gets requested from hugging face. Is there a way to skip this step once it's cached?
Have run well with GPU Intel(R) Iris(R) Xe Graphics 8G. I want to see what the model like: I search by file size or file keywords(pkl model wgsl) through the...
I'm curious if you guys will provide StableLM capability on the web-llm? It would be really great if so.