Charlie Ruan comments

Results 167 comments of


                                            Charlie Ruan

Model request: Phi 3 mini 128K

Phi3-mini, StableLM 1.6B, Qwen 1.8B were just added to the prebuilt list here: https://github.com/mlc-ai/web-llm/pull/433 Will bump the version to 0.2.39 soon. Note the phi3 we added was 4k instead of...

Model request: Phi 3 mini 128K

Just published 0.2.39; those models are now included in the prebuilt app config!

Model request: Phi 3 mini 128K

npm 0.2.62 now supports Phi3.5-mini: https://github.com/mlc-ai/web-llm/pull/556 Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can...

Model request: Phi 3 mini 128K

Closing this issue for now as Phi-3.5 should suffice the need described. Feel free to open new ones if new issues arise!

support concurrent inference from multiple models

Thanks for the request. Having multiple models in a single engine simultaneously is something we are looking into now. Meanwhile, would having two `MLCEngine` work for your case?

support concurrent inference from multiple models

Hi @mikestaub, from npm 0.2.60, a single engine can load multiple models, and the models can process requests concurrently. However, I have not tested the performance benefit (if any) to...

support concurrent inference from multiple models

Closing this issue as completed. Feel free to reopen/open new ones if issues arise!

How to let the user cancel loading the model and stop it from fetching params

May relate to this: - https://github.com/mlc-ai/web-llm/issues/484

LLama 3.1 Error: Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model.

Are you seeing this on chat.webllm.ai? Perhaps try the one with -1k suffix, which has smaller kv cache, hence less memory requirement. Also try q4f16_1 instead of q4f32_1.

LLama 3.1 Error: Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model.

The f16 error suggests that the WebGPU on your browser/device does not support f16 computation. You can check it manually at https://webgpureport.org/. If supported, you should see this `shader-f16` in...