mlc-llm Add support for Gorilla

This PR adds support for Gorilla, which is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.

Steps:

Download Gorilla delta weights

mkdir gorilla-delta
cd gorilla-delta 
git lfs install
git clone https://huggingface.co/gorilla-llm/gorilla-7b-hf-delta-v0

Download LLaMA weights and apply delta

git clone https://github.com/ShishirPatil/gorilla
cd gorilla/inference
python apply_delta.py --base-model-path decapoda-research/llama-7b-hf --target-model-path /path/to/target/dir  --delta-path /path/to/gorilla-delta

Run CLI

./build/mlc_chat_cli --model gorilla-weights

Example Output:

(sudeep-mlc-llm) ./build/mlc_chat_cli --model gorilla-weights
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/ndarray-cache.json"
Use model library: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/gorilla-weights-q3f16_0-vulkan.so"
Loading model...
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

USER: I want to generate image from text.
ASSISTANT: <<<domain>>>: Multimodal Image-to-Image Generation
<<<api_call>>>: ControlNetModel.from_pretrained('lllyasviel/controlNet')
<<<api_provider>>>: Hugging Face
<<<explanation>>>:1. First, import the necessary libraries and models from Hugging Face and others.2. Use the ControlNetModel to create a control net that can be used to generate images from text.3. The control net is a neural network that is pre-trained to generate images conditionally.4. Load an image and apply a mask to it to create a controlled image that can be used for research/educational purposes.5. The controlled image can be saved as a file (e.g., .png file) to be used in an application.

Jun 01 '23 14:06 sudeepag

Do I need to make any other specific changes for Android and/or iOS?

Jun 01 '23 15:06 sudeepag

Would be great if we can validate if the model lib can be reused via vicuna-q3f16_0, if so, we should add --reuse-lib flag or something similar so we do not need to distributed specific model, and only weight is needed.

Once we do that, it should be supported out of box from WebLLM

Jun 01 '23 15:06 tqchen

Another item that would be relevant here for future work is to remove the name based model matching here and instead use matching of config.json, so future addition of models should work out of box without changing build

Jun 01 '23 15:06 tqchen

This is awesome! Thanks for the PR!

Jun 01 '23 16:06 junrushao

Hi @sudeepag , would you mind fixing the supported_model_types as @MasterJH5574 mentioned so that we can merge this?

Jun 02 '23 23:06 yzh119

@yzh119 Sorry, was out for a few days. I've addressed the comment in https://github.com/mlc-ai/mlc-llm/pull/288/commits/33a9172aa8b9a5e374ed7e294c3917121498bf87.

Jun 06 '23 21:06 sudeepag

@tqchen I confirmed that reuse-lib is working as expected. Here are my steps:

Confirm vicuna-v1-7b lib is present locally

(sudeep-mlc-llm) ls dist/vicuna-v1-7b-q3f16_0
debug  mod_cache_before_build_vulkan.pkl  params  vicuna-v1-7b-q3f16_0-vulkan.so

Build using gorilla weights + reuse vicuna-v1-7b lib

(sudeep-mlc-llm) python build.py --model /home/sudeepag/gorilla-weights --target vulkan --reuse-lib vicuna-v1-7b-q3f16_0
Using path "/home/sudeepag/gorilla-weights" for model "gorilla-weights"
Database paths: ['log_db/rwkv-raven-7b', 'log_db/rwkv-raven-3b', 'log_db/vicuna-v1-7b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b']
Target configured: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -supports_16bit_buffer=1 -supports_8bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -supports_storage_buffer_storage_class=1 -thread_warp_size=1
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_89 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 2.8401594161987305 GB
Start storing to cache dist/gorilla-weights-q3f16_0/params
[0519/0519] saving param_518
All finished, 130 total shards committed, record saved to dist/gorilla-weights-q3f16_0/params/ndarray-cache.json
Save a cached module to dist/gorilla-weights-q3f16_0/mod_cache_before_build_vulkan.pkl.
Dump static shape TIR to dist/gorilla-weights-q3f16_0/debug/mod_tir_static.py
Dump dynamic shape TIR to dist/gorilla-weights-q3f16_0/debug/mod_tir_dynamic.py
Reuse existing prebuilt lib {ARGS.reuse_lib}...
Finish exporting chat config to dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json
free(): invalid pointer
Aborted (core dumped)

Confirm CLI is working correctly.

(sudeep-mlc-llm) ./build/mlc_chat_cli --model gorilla-weights
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/ndarray-cache.json"
Use model library: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/gorilla-weights-q3f16_0-vulkan.so"
Loading model...
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

USER: I want to generate image from text.
ASSISTANT: <<<domain>>>: Multimodal Image-to-Image Generation
<<<api_call>>>: ControlNetModel.from_pretrained('lllyasviel/controlNet')
<<<api_provider>>>: Hugging Face
<<<explanation>>>:1. First, import the necessary libraries and models from Hugging Face and others.2. Use the ControlNetModel to create a control net that can be used to generate images from text.3. The control net is a neural network that is pre-trained to generate images conditionally.4. Load an image and apply a mask to it to create a controlled image that can be used for research/educational purposes.5. The controlled image can be saved as a file (e.g., .png file) to be used in an application.

Jun 09 '23 02:06 sudeepag

mlc-llm mlc-llm copied to clipboard

Add support for Gorilla

mlc-llm
mlc-llm copied to clipboard