Results 167 comments of Charlie Ruan

Phi3-mini, StableLM 1.6B, Qwen 1.8B were just added to the prebuilt list here: https://github.com/mlc-ai/web-llm/pull/433 Will bump the version to 0.2.39 soon. Note the phi3 we added was 4k instead of...

Just published 0.2.39; those models are now included in the prebuilt app config!

npm 0.2.62 now supports Phi3.5-mini: https://github.com/mlc-ai/web-llm/pull/556 Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can...

Closing this issue for now as Phi-3.5 should suffice the need described. Feel free to open new ones if new issues arise!

Thanks for the request. Having multiple models in a single engine simultaneously is something we are looking into now. Meanwhile, would having two `MLCEngine` work for your case?

Hi @mikestaub, from npm 0.2.60, a single engine can load multiple models, and the models can process requests concurrently. However, I have not tested the performance benefit (if any) to...

Closing this issue as completed. Feel free to reopen/open new ones if issues arise!

May relate to this: - https://github.com/mlc-ai/web-llm/issues/484

Are you seeing this on chat.webllm.ai? Perhaps try the one with -1k suffix, which has smaller kv cache, hence less memory requirement. Also try q4f16_1 instead of q4f32_1.

The f16 error suggests that the WebGPU on your browser/device does not support f16 computation. You can check it manually at https://webgpureport.org/. If supported, you should see this `shader-f16` in...