Charlie Ruan comments

Results 167 comments of


                                            Charlie Ruan

Cannot find WebGPU on Safari (works on Arc)

Hi @louis030195, Safari should be supported. I tried WebLLM Chat on: - Macbook: macOS Sonoma 14.5 with Safari Technology Preview - iPhone: iOS 18.0 Developer Beta with Safari (need WebLLM...

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'.

They are due to the system prompt as shown in the `mlc-chat-config.json`: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases. If you'd like not to use a system...

Model Request Tracking

Hi @tlopex, thanks for the questions! > 2. Does this mean that as long as the results are consistent, the implementation is acceptable? That is largely correct, as long as...

Model Request Tracking

@tlopex Apologies for the late reply. Please keep the questions coming, it'd also be helpful for other people trying to learn the workflow. > 1. I found that the code...

如何使用自己的数据集微调MoE-LLaVA

可以复用`train.py`，然后把比如`MoELLaVAStablelmForCausalLM`替换成`EvalMoELLaVAStablelmForCausalLM`，后面就不用`initialize_moe_modules()`了；然后根据需要来`requires_grad_()`

[Question] 如何基于MoE模型，在自己的数据上进一步微调呢？

可以复用train.py，然后把比如`MoELLaVAStablelmForCausalLM`替换成`EvalMoELLaVAStablelmForCausalLM`，后面就不用`initialize_moe_modules()`了；然后根据需要来`requires_grad_()`

[Device] Catch WebGPU OOM error

Example of trying to allocate a KV cache with 900k context length (should be similar for trying to load a model that is too large):

[Device] Catch WebGPU OOM error

Marked as a draft for now as it depends on https://github.com/apache/tvm/pull/17005

Issue with source map in v0.2.37 when running with Vite

Thanks for reporting the error, will send a fix soon

Issue with source map in v0.2.37 when running with Vite

Could you try 0.2.38? Should be fixed via https://github.com/mlc-ai/web-llm/pull/415. Apologies for the inconvenience