mlc-llm
mlc-llm copied to clipboard
Sorry for not already knowing this, but how can I load other models?
https://mlc.ai/mlc-llm/
I made those instructions work and can speak to vicuna-v1-7b but I'd like to mess with others.
git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib
Am I correct in assuming that "lib" (no idea what any of this actually means) is tied to that model?
Ok, I only looked at this briefly, so hopefully I'm not steering anyone in the wrong direction. But as far as I can tell you have to build new models yourself using the python build
script. You can find all the parameters that it uses here:
https://github.com/mlc-ai/mlc-llm/blob/main/build.py#L16
It also looks like (unless otherwise specified) that it appears to build a model compatible with your present environment (basic gpu detection), so I'm guessing if you want something to run for other target environments (iphone, vulkan, etc) that you'd have to run this multiple times for each target environment. 🤔
Also, if you go to the web-llm repo (the sister project to this one) there are more detailed instructions on how to build under the Instructions For Local Deployment
section here:
https://github.com/mlc-ai/web-llm#instructions-for-local-deployment
The main difference of course being that you'll have to change the --target
parameter from webgpu
to whatever target environment you require (or don't use the target parameter to have it automatically build based on your existing environment).
With all that being said, I have not tried tried any of this myself, so no idea how difficult it is, what kind of system requirements are expected, or what additional information is relevant. Maybe this weekend I'll give it a shot because I'd love to try other models here as well.
Well thank you very much for replying! I hope a future update will make this process more accessible to newbs like me. For a minute there I thought I could tweak something and put a model in the dist folder and make it work hehe. Bad magic on my part :P
Thanks for asking, we plan to release more detailed tutorials in the incoming month.
@tqchen Thank you for your work and I am looking forward to your future detailed tutorials! I would also like to know if a model is deployed on Linux and cannot be deployed on Windows because of the use of triton, is it possible to complete the deployment on Windows through this project?Thank you!
At the moment tvm unity(the main MLC pipeline we rely on) supports as many platforms as possible, so yes it should work on windows (the existing demo also should already work there), it might involve bringing some of the kernels into TensorIR
I've tried getting dolly-v2-3b running, but getting an error. Here were the steps I took which got me to generate a binary but the binary doesn't seem to run properly, maybe someone can figure out the last step:
- Install python v3.11.3
- Install conda v23.1.0
- Create an environment
conda create -n mlc-llm
and activate itconda activate mlc-llm
- Install numpy separately
conda install numpy=1.23.0
- Install other required packages with pip
pip install .
- Download a model from huggingface using git LFS. e.g.
git lfs install && git clone https://huggingface.co/databricks/dolly-v2-3b
and put it in thedist/models
folder e.g.mkdir -p dist/models && mv dolly-v2-3b dist/models/dolly-v2-3b
- Run
python build.py --model dolly-v2-3b --dtype float16
- Move the output folder to the correct location for your mlc_chat instance.
- Copy the
tokenizer.json
file from the original model folder into the output foldercp dist/models/dolly-v2-3b/tokenizer.json dist/dolly-v2-3b
- Run
mlc_chat_cli --model=dolly-v2-3b --dtype float16
The error:
Use lib /Users/josh/dev/mlc-chat/dist/dolly-v2-3b/float16/dolly-v2-3b_metal_float16.so
[12:08:17] /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=Apple M1 Pro
Initializing the chat module...
[12:08:18] /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/relax_vm/vm.cc:768:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (func.defined()) is false: Error: Cannot find PackedFunc vm.builtin.null_value in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable
Hi, @Joshuabaker2 , your issue is similar to mine (#104 ).
I believe you didn't build a correct dynamic library target using python build.py
, the generated tvm model is broken. Checkout conversations in my issue to see if it helps.
Remember to use python3 tests/chat.py --model=dolly-v2-3b --dtype=float16
to start chat instead of mlc_chat_cli
since cpp cli doesn't support the new huggingface tokenizer format yet.
FWIW I solved my problem another way: https://www.reddit.com/r/LocalLLaMA/comments/13cjfs9/credit_where_due_thank_you_crabby_autistic_happy/
But I hope this stays up and active and eventually resolves. This will always have been my first real local LLM.