mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Sorry for not already knowing this, but how can I load other models?

Open Innomen opened this issue 1 year ago • 8 comments

https://mlc.ai/mlc-llm/

I made those instructions work and can speak to vicuna-v1-7b but I'd like to mess with others.

git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib

Am I correct in assuming that "lib" (no idea what any of this actually means) is tied to that model?

Innomen avatar May 03 '23 19:05 Innomen

Ok, I only looked at this briefly, so hopefully I'm not steering anyone in the wrong direction. But as far as I can tell you have to build new models yourself using the python build script. You can find all the parameters that it uses here: https://github.com/mlc-ai/mlc-llm/blob/main/build.py#L16

It also looks like (unless otherwise specified) that it appears to build a model compatible with your present environment (basic gpu detection), so I'm guessing if you want something to run for other target environments (iphone, vulkan, etc) that you'd have to run this multiple times for each target environment. 🤔

Also, if you go to the web-llm repo (the sister project to this one) there are more detailed instructions on how to build under the Instructions For Local Deployment section here: https://github.com/mlc-ai/web-llm#instructions-for-local-deployment

The main difference of course being that you'll have to change the --target parameter from webgpu to whatever target environment you require (or don't use the target parameter to have it automatically build based on your existing environment).

With all that being said, I have not tried tried any of this myself, so no idea how difficult it is, what kind of system requirements are expected, or what additional information is relevant. Maybe this weekend I'll give it a shot because I'd love to try other models here as well.

zeeroh avatar May 03 '23 21:05 zeeroh

Well thank you very much for replying! I hope a future update will make this process more accessible to newbs like me. For a minute there I thought I could tweak something and put a model in the dist folder and make it work hehe. Bad magic on my part :P

Innomen avatar May 03 '23 21:05 Innomen

Thanks for asking, we plan to release more detailed tutorials in the incoming month.

tqchen avatar May 03 '23 21:05 tqchen

@tqchen Thank you for your work and I am looking forward to your future detailed tutorials! I would also like to know if a model is deployed on Linux and cannot be deployed on Windows because of the use of triton, is it possible to complete the deployment on Windows through this project?Thank you!

sydlfl avatar May 04 '23 03:05 sydlfl

At the moment tvm unity(the main MLC pipeline we rely on) supports as many platforms as possible, so yes it should work on windows (the existing demo also should already work there), it might involve bringing some of the kernels into TensorIR

tqchen avatar May 04 '23 12:05 tqchen

I've tried getting dolly-v2-3b running, but getting an error. Here were the steps I took which got me to generate a binary but the binary doesn't seem to run properly, maybe someone can figure out the last step:

  1. Install python v3.11.3
  2. Install conda v23.1.0
  3. Create an environment conda create -n mlc-llm and activate it conda activate mlc-llm
  4. Install numpy separately conda install numpy=1.23.0
  5. Install other required packages with pip pip install .
  6. Download a model from huggingface using git LFS. e.g. git lfs install && git clone https://huggingface.co/databricks/dolly-v2-3b and put it in the dist/models folder e.g. mkdir -p dist/models && mv dolly-v2-3b dist/models/dolly-v2-3b
  7. Run python build.py --model dolly-v2-3b --dtype float16
  8. Move the output folder to the correct location for your mlc_chat instance.
  9. Copy the tokenizer.json file from the original model folder into the output folder cp dist/models/dolly-v2-3b/tokenizer.json dist/dolly-v2-3b
  10. Run mlc_chat_cli --model=dolly-v2-3b --dtype float16

The error:

Use lib /Users/josh/dev/mlc-chat/dist/dolly-v2-3b/float16/dolly-v2-3b_metal_float16.so
[12:08:17] /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=Apple M1 Pro
Initializing the chat module...
[12:08:18] /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/relax_vm/vm.cc:768:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (func.defined()) is false: Error: Cannot find PackedFunc vm.builtin.null_value in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable

Joshuabaker2 avatar May 05 '23 19:05 Joshuabaker2

Hi, @Joshuabaker2 , your issue is similar to mine (#104 ).

I believe you didn't build a correct dynamic library target using python build.py, the generated tvm model is broken. Checkout conversations in my issue to see if it helps.

Remember to use python3 tests/chat.py --model=dolly-v2-3b --dtype=float16 to start chat instead of mlc_chat_cli since cpp cli doesn't support the new huggingface tokenizer format yet.

shiqimei avatar May 09 '23 08:05 shiqimei

FWIW I solved my problem another way: https://www.reddit.com/r/LocalLLaMA/comments/13cjfs9/credit_where_due_thank_you_crabby_autistic_happy/

But I hope this stays up and active and eventually resolves. This will always have been my first real local LLM.

Innomen avatar May 09 '23 15:05 Innomen