Charlie Ruan comments

Results 167 comments of


                                            Charlie Ruan

Use subgroup operations when possible

Thanks @beaufortfrancois @dneto0 for the insights and pointers, super helpful! > 1K is very common. I see, the link is quite insightful. I'll go with 1k for the performant set...

Use subgroup operations when possible

@beaufortfrancois Sorry for the delay... Not much update yet, but I do want to get this landed

Phi 3 Mini output near random (Phi-3-mini-4k-instruct-q4f16_1-MLC)

Hi! I don't seem to be able to reproduce it. What device are you using? And would `Phi-3-mini-4k-instruct-q4f16_1-MLC-1k` work?

Phi 3 Mini output near random (Phi-3-mini-4k-instruct-q4f16_1-MLC)

Hmm that is a bit weird. I don't think it is due to corrupted downloaded weights. To triage a bit, could you try a smaller model like `Qwen2-0.5B-Instruct-q4f16_1`, or is...

Phi 3 Mini output near random (Phi-3-mini-4k-instruct-q4f16_1-MLC)

I am guessing it is due to WebGPU not being compatible with the usage of WebLLM. Could you share your output of https://webgpureport.org/ in Chrome if you do not mind?

Improvements for Function Calling

Thank you for the suggestion! We acknowledge that https://github.com/mlc-ai/web-llm/pull/451 is only a preliminary support and will improve it. In the meantime, it might be possible to use models like Hermes-2-Pro...

fix: engine cannot give response to the second user request with stre…

thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you!

List of currently available models.

Thanks all for the input. This is a great point and we should definitely add a list of models somewhere, and point to that in README, documentation, webpage, etc >...

Gemma 2 2B crashes on mobile phone

Do you happen to have the console log? Besides, what is the `maxStorageBufferBindingSize` in your webgpureport.org?

Gemma 2 2B crashes on mobile phone

It may be due to one of the limits being exceeded (not necessarily the buffer size, 2GB sounds enough). Gemma requires a larger size for certain buffers than other models...