Results 166 comments of Charlie Ruan

What o3 says: https://chatgpt.com/share/6818fa08-277c-8013-aeeb-65dd9e3914c1

Thanks for your interest in WebLLM! This kernel is indeed not used in WebLLM yet, as WebLLM currently applies penalties using CPU (i.e. not using a GPU kernel). This is...

Some future TODOs for embeddings: - [ ] Support nomic-v1.5 for longer context (may not simply return first token logits like snowflake-arctic, hence requiring changes in `EmbeddingPipeline`) - [ ]...

Hi there! Thanks for the discussion. MLC-LLM and TVM are the two sources for the implementation of the WASM (both WebGPU kernels and necessary runtime support such as tensor manipulation)....

Thanks for your contribution! Out of curiosity, I tried it out in Scribbler but ran into the following, do you know why this may be the case:

Tried it out, works smoothly. Thank you so much!

Thanks for the discussion here! The tokenizers-cpp related issue should be fixed after 0.2.79 as discussed in the PR mentioned in the thread. Gemma3 support is still blocked by a...

Thanks for the thoughts and discussions @Neet-Nestor @flatsiedatsie! The code above will work fine: `engine2` will not do completion and `engine1` is not affected by `engine2`. However, `engine2` will load...

Thanks! Will take a look this week. Though the baseline you have seems to be an un-quantized version of phi3.5-vision, while WebLLM uses 4bit quantized (hence the code name q4).