mlc-llm Missing instructions on installing additional models

Hey there, congratulations on a great release! The app works great on a Mac and the installation was very straightforward.

Do you have plans for growing the mlc_chat_cli into a standalone tool or is it meant to be a proof of concept? Readme claims the project can be used to run 'any language model', but there are no instructions for how to do it. Furthermore, code seems to indicate that only three models are supported right now, is that right?

Unless the mlc_chat_cli is supposed to be a toy demo, could you please add instructions for:

which models are supported (e.g. would RNN based models like https://github.com/BlinkDL/RWKV-LM work or is it just transformers)?
which formats, quantization methods and directory structures are supported - i.e. I don't think grabbing a random link from HF and cloning it the same way Vicuna was installed during original installation (git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b) would work, right?
it seems that there is a template/profile system for different LLM families, how do we add additional templates? Does it require patch/pull-request or can it be done by tweaking a config file somewhere?
the Readme mentions multiple optimizations, but the mlc_chat_cli doesn't expose that info/settings to the user. How do we tweak those?
given that the claim in the Readme is that all language models are supported, there should be some kind of rough guide on how to calculate the hardware requirements (e.g. what LLMs can my machine run using this tool, with what quantization and performance?) as a comparison, llama.cpp Readme isn't well-structured, but does provide a good overview of RAM requirements for a given model size and impact of different quantization techniques on performance

Also, it would be very neat if you mentioned in the Readme, what kind of community interactions are you aiming for. Would you prefer that people build their own tools that use mlc-llm as a backend or send PRs for improving mlc_chat_cli?

Apr 30 '23 18:04 execveat

Thanks you for your input. Indeed there are a lot of things we can improve. This is the beginning of the release so indeed there are a lot of things can be added on top. We will release followup materials on guides and local builds.

There are two components of the project.

The MLC(machine learning compilation) part, which is the overall productive flow of adding new models and backend optimizations, they are build on top of TVM unity pipeline. The pipeline itself is generic to adapt and add new models.
- What we would like to enable is that overall MLC flow works for any model, and can be adapted to support things like RNN type models you mentioned.
- The optimizations are also part of the MLC pipeline
- There is a course that introduces the related concept.
- See also @junrushao 's comment here https://github.com/mlc-ai/mlc-llm/issues/6#issuecomment-1528952903
- We will be adding a few more models and use these to built up tutorials on new model support
The mlc_chat_cli is a runtime component that runs the compiled code, so the memory consumption and cost profile depends on the related models. This is a module that will work with any of the compiled models. It is a great suggestion that we can mention the overall requirement of some of the prebuilt settings as we expand more model support in the community.
Additionally, there is a libmlc_llm module (which mlc_chat_cli depends on) that can be used to be embed into any of the applications (e.g. a game engine) that would like to leverage the MLC-LLM.

One thing to mention is that the overall MLC flow is in python and highly customizable. For example, we could easily add 3bit int, or new formats like 4bit floating points to the python flow that may or may not sit in this repo. It took us about the order of say a few days to explore a few different quantization format and use ML compilation optimize and generate high performing code.

And yes, as an open source community, we love contributions and pull requests.

Apr 30 '23 20:04 tqchen

Thank you for the explanation! I see that there is support for more models in mlc_llm/conversation.py, but the list in cpp/cli_main.cc is more limited. I guess this is just work in progress?

I would greatly suggest option to override profile selection via command line argument instead of always taking it from the path name. And moving profile / template definitions into an user-editable config file would be amazing as well (e.g. to customize the prompt and temperature).

Apr 30 '23 21:04 execveat

Are there instructions for how to convert existing models to be used with mlc-llm? Reading this current thread, and this one, it seems possible, but I've not found any hints as to how to start.

Apr 30 '23 21:04 elbowdonkey

Thank you for your suggestion, we will work on the instructions in the incoming weeks. The current build.py pipeline should support the llama class and there is WIP on other classes of models

Apr 30 '23 22:04 tqchen

Hi, is this project mainly working on LLMs? I wonder if the MLC flow works for image generation models (e.g., Stable Diffusion).

May 04 '23 08:05 yx-chan131

@yx-chan131 yes, checkout https://github.com/mlc-ai/web-stable-diffusion

May 04 '23 12:05 tqchen

Looking forward to the instruction, I am waiting for integrating it in my chat bot. https://github.com/Poordeveloper/chatgpt-app

May 05 '23 07:05 Poordeveloper

Thank you for your suggestion, we will work on the instructions in the incoming weeks. The current build.py pipeline should support the llama class and there is WIP on other classes of models

Yup any start would be fine! looking forward to this

(Let me know if you need any help! Worked with LLM in production on for big GPU's )

Aug 18 '23 09:08 gamingflexer

Closing this issue for now due to inactivity. Feel free to reopen or open another issue if there are other questions!

Oct 09 '23 12:10 CharlieFRuan

mlc-llm mlc-llm copied to clipboard

Missing instructions on installing additional models

mlc-llm
mlc-llm copied to clipboard