Charlie Ruan comments

Results 166 comments of


                                            Charlie Ruan

Try to prettify google search result

What o3 says: https://chatgpt.com/share/6818fa08-277c-8013-aeeb-65dd9e3914c1

Try to prettify google search result

Yep, they are all there!

Where is `apply_penalty_inplace_kernel` used?

Thanks for your interest in WebLLM! This kernel is indeed not used in WebLLM yet, as WebLLM currently applies penalties using CPU (i.e. not using a GPU kernel). This is...

[Tracking][WebLLM] Function calling (beta) and Embeddings

Some future TODOs for embeddings: - [ ] Support nomic-v1.5 for longer context (may not simply return first token logits like snowflake-arctic, hence requiring changes in `EmbeddingPipeline`) - [ ]...

Understanding the WASM behind Web-LLM

Hi there! Thanks for the discussion. MLC-LLM and TVM are the two sources for the implementation of the WASM (both WebGPU kernels and necessary runtime support such as tensor manipulation)....

Update README.md

Thanks for your contribution! Out of curiosity, I tried it out in Scribbler but ran into the following, do you know why this may be the case:

Update README.md

Tried it out, works smoothly. Thank you so much!

Model Request: Gemma 3

Thanks for the discussion here! The tokenizers-cpp related issue should be fixed after 0.2.79 as discussed in the PR mentioned in the thread. Gemma3 support is still blocked by a...

Feature request: engine.preload()

Thanks for the thoughts and discussions @Neet-Nestor @flatsiedatsie! The code above will work fine: `engine2` will not do completion and `engine1` is not affected by `engine2`. However, `engine2` will load...

Accuracy of Phi-3.5 Vision Models in Web-LLM is way off

Thanks! Will take a look this week. Though the baseline you have seems to be an un-quantized version of phi3.5-vision, while WebLLM uses 4bit quantized (hence the code name q4).