[Tracking][WebLLM] Runtime updates

Open CharlieFRuan opened this issue 1 year ago • 0 comments

There are various runtime things we'd like to update and complete in WebLLM

[ ] Support grammar for Llama 3, hence update Hermes 2 support from Mistral to Llama3-based
- Compile following changes to MLC runtime wasm
  - https://github.com/mlc-ai/mlc-llm/pull/2248
  - https://github.com/mlc-ai/mlc-llm/pull/2335
  - https://github.com/mlc-ai/mlc-llm/pull/2416
[ ] Support Phi-3 mini
[ ] Update function calling API to better accommodate Hermes 2
[ ] Remove mean_gen_len, max_gen_len, shift_fill_factor usages
- Follow logic in https://github.com/mlc-ai/mlc-llm/blob/4538cc724c1e66917c34b59f3747f8d828a6c7c5/python/mlc_llm/interface/chat.py#L174
[ ] Remove KVCache size model metadata dependency -- perhaps allow user to determine usage of KVCache size and sliding window
- As per https://github.com/mlc-ai/mlc-llm/pull/2434
[ ] Add OpenAI new field include_usage
[ ] Remove resolve/main from URL of model, and update model_id to be repo name
[ ] Rename model_url and model_lib_url to model and model_lib
[ ] Add Streamer for better emoji support (for both Llama3-like tokenizers and Llama2-like tokenizers)

May 27 '24 16:05 CharlieFRuan