web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

[Tracking][WebLLM] Runtime updates

Open CharlieFRuan opened this issue 1 year ago • 0 comments

Overview

There are various runtime things we'd like to update and complete in WebLLM

  • [ ] Support grammar for Llama 3, hence update Hermes 2 support from Mistral to Llama3-based
    • Compile following changes to MLC runtime wasm
      • https://github.com/mlc-ai/mlc-llm/pull/2248
      • https://github.com/mlc-ai/mlc-llm/pull/2335
      • https://github.com/mlc-ai/mlc-llm/pull/2416
  • [ ] Support Phi-3 mini
  • [ ] Update function calling API to better accommodate Hermes 2
  • [ ] Remove mean_gen_len, max_gen_len, shift_fill_factor usages
    • Follow logic in https://github.com/mlc-ai/mlc-llm/blob/4538cc724c1e66917c34b59f3747f8d828a6c7c5/python/mlc_llm/interface/chat.py#L174
  • [ ] Remove KVCache size model metadata dependency -- perhaps allow user to determine usage of KVCache size and sliding window
    • As per https://github.com/mlc-ai/mlc-llm/pull/2434
  • [ ] Add OpenAI new field include_usage
  • [ ] Remove resolve/main from URL of model, and update model_id to be repo name
  • [ ] Rename model_url and model_lib_url to model and model_lib
  • [ ] Add Streamer for better emoji support (for both Llama3-like tokenizers and Llama2-like tokenizers)

CharlieFRuan avatar May 27 '24 16:05 CharlieFRuan