Charlie Ruan
Charlie Ruan
@cometta Hmm is there a specific reason for this? We do have APIs to delete the model weights from cache
Hi, out of curiosity, which version of mlc-llm are you using, what is the length of the context, and which model is it? I remember an **older version** of mlc-llm...
Thanks for raising the issue. @harrywhoo is close to fixing this
Hi folks, sorry for the delay, it is still undergoing. In the meantime, to unblock immediately, it might be helpful to checkout to the commits listed in this PR and...
Hi, thanks for your interest! You can check out this example for how to use RAG w/ WebLLM: https://github.com/mlc-ai/web-llm/tree/main/examples/embeddings We support `snowflake-arctic-embed` as of now
Hi! Yes, the b4 and b32 wasms have different WebGPU kernels, but share the same weights (hence the same HF URLs). See https://github.com/mlc-ai/web-llm/pull/538 for details: > `b32` means the model...