mlc-llm [NOTICE] Transition from ChatModule to MLCEngine

trafficstars

As we start to formalize MLC LLM Engine, we are moving towards a more comprehensive API that is OpenAI compatible. This means a lot of new features that allows us to do more things across our backends, including

JSON mode and function calls
Multimodality
Prefix and prompt caching
Speculative decoding

The project started with ChatModule, which primarily focuses on Chat. This is a note to community that we are planning to phasing out ChatModule in favor of the MLCEngine. Another advantage is that after this we will have a single engine that backs all our backends, enabling features in one backend quickly enabled in another.

Transition

As of now ChatModule is still available, but we try to avoid mention it in docs. The current mlc_llm chat is still backed by the ChatModule. We are working on a JSONFFIEngine, with pure json_string input/output, to enable us to expose a broader set of interface(as in openai) to broader set of backends. So the transition will happen once JSONFFIEngine lands. One additional thing we will need is automatic prefix caching to speedup multiround chat. Backends like iOS and android will interface with JSONFFIEngine (that have full openai features)

[x] JSONFFIEngine
[x] PrefixCache

Additionally, we understand that there is a desire to access MLC through a low-level API, that directly leverages TVM runtime. ChatModule and its CLI has been useful for some debugging purposes.

For such low-level debugging, we do not necessarily need the full engine that supports continuos batching and spec decoding. We introduce a debug chat https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/testing/debug_chat.py, which features more inspection and single round input output generation. We can consider build possible c++ versions of it as well .

We are still working on some of the above items, so this is not an immediate item, but we would like to bring awareness to the community.

Apr 25 '24 13:04 tqchen

@tqchen Does a new engine still support custom or modified chat/conversation templates in MLC config? For example, sometimes there are differences between the original model and the fine-tuned model.

Apr 29 '24 23:04 MikeLP

@MikeLP yes, we should keep such customization

Apr 30 '24 00:04 tqchen

https://github.com/mlc-ai/mlc-llm/pull/2279 brings an initial iOS ver of MLCEngine

May 06 '24 02:05 tqchen

https://github.com/mlc-ai/mlc-llm/pull/2380 transitions iOS ChatApp to MLCEngine

May 22 '24 20:05 tqchen

https://github.com/mlc-ai/mlc-llm/pull/2410 transitions the android to the MLCEngine

May 27 '24 17:05 tqchen

We have completed the transition steps

May 27 '24 21:05 tqchen

mlc-llm mlc-llm copied to clipboard

[NOTICE] Transition from ChatModule to MLCEngine

Transition

mlc-llm
mlc-llm copied to clipboard