web-llm
web-llm copied to clipboard
[Tracking][WebLLM] Runtime updates
Overview
There are various runtime things we'd like to update and complete in WebLLM
- [ ] Support grammar for Llama 3, hence update Hermes 2 support from Mistral to Llama3-based
- Compile following changes to MLC runtime wasm
- https://github.com/mlc-ai/mlc-llm/pull/2248
- https://github.com/mlc-ai/mlc-llm/pull/2335
- https://github.com/mlc-ai/mlc-llm/pull/2416
- Compile following changes to MLC runtime wasm
- [ ] Support Phi-3 mini
- [ ] Update function calling API to better accommodate Hermes 2
- [ ] Remove
mean_gen_len,max_gen_len,shift_fill_factorusages- Follow logic in https://github.com/mlc-ai/mlc-llm/blob/4538cc724c1e66917c34b59f3747f8d828a6c7c5/python/mlc_llm/interface/chat.py#L174
- [ ] Remove KVCache size model metadata dependency -- perhaps allow user to determine usage of KVCache size and sliding window
- As per https://github.com/mlc-ai/mlc-llm/pull/2434
- [ ] Add OpenAI new field
include_usage - [ ] Remove
resolve/mainfrom URL ofmodel, and updatemodel_idto be repo name - [ ] Rename
model_urlandmodel_lib_urltomodelandmodel_lib - [ ] Add Streamer for better emoji support (for both Llama3-like tokenizers and Llama2-like tokenizers)