Charlie Ruan
Charlie Ruan
Thanks @beaufortfrancois @dneto0 for the insights and pointers, super helpful! > 1K is very common. I see, the link is quite insightful. I'll go with 1k for the performant set...
@beaufortfrancois Sorry for the delay... Not much update yet, but I do want to get this landed
Hi! I don't seem to be able to reproduce it. What device are you using? And would `Phi-3-mini-4k-instruct-q4f16_1-MLC-1k` work?
Hmm that is a bit weird. I don't think it is due to corrupted downloaded weights. To triage a bit, could you try a smaller model like `Qwen2-0.5B-Instruct-q4f16_1`, or is...
I am guessing it is due to WebGPU not being compatible with the usage of WebLLM. Could you share your output of https://webgpureport.org/ in Chrome if you do not mind?
Thank you for the suggestion! We acknowledge that https://github.com/mlc-ai/web-llm/pull/451 is only a preliminary support and will improve it. In the meantime, it might be possible to use models like Hermes-2-Pro...
thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you!
Thanks all for the input. This is a great point and we should definitely add a list of models somewhere, and point to that in README, documentation, webpage, etc >...
Do you happen to have the console log? Besides, what is the `maxStorageBufferBindingSize` in your webgpureport.org?
It may be due to one of the limits being exceeded (not necessarily the buffer size, 2GB sounds enough). Gemma requires a larger size for certain buffers than other models...