Hongyi Jin
Hongyi Jin
Thanks for your suggestion. We will definitely support this feature after we enable more models.
I'm not sure about this. You can have a try
It's coming soon. Our team is testing running vicuna within 4gb memory internally and will make it public soon
Try out our latest project https://github.com/mlc-ai/mlc-llm. You can run a model within 4gb memory constraints in native runtime. We will support 4gb llm on web later
Thank you for advice. We are happy to see more and more model support in web-llm. There are already open PR about ChatGLM and Dolly model support. If you are...
Yes of course embedding can be represented in TensorIR. So basically what you need is to translate the model (pytorch implementation) into corresponding relax operator. If there's no direct translation,...
https://github.com/mlc-ai/web-llm/issues/19#issuecomment-1518940773
Thanks for the issue. There's some additional information I need to know. Is it only the json file get requested from huggingface or json+shards all get requested?
I've checked that it's doable to skip the step. Will fix this in incoming weeks. If you find this issue urgent, I can show you the related code and you...
it seems interesting to get StableLM in, but it's not at the top priority among the models we are going to support. Our next model to support is Dolly, and...