mlc-llm
mlc-llm copied to clipboard
Add support for Gorilla
This PR adds support for Gorilla, which is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.
Steps:
- Download Gorilla delta weights
mkdir gorilla-delta
cd gorilla-delta
git lfs install
git clone https://huggingface.co/gorilla-llm/gorilla-7b-hf-delta-v0
- Download LLaMA weights and apply delta
git clone https://github.com/ShishirPatil/gorilla
cd gorilla/inference
python apply_delta.py --base-model-path decapoda-research/llama-7b-hf --target-model-path /path/to/target/dir --delta-path /path/to/gorilla-delta
- Run CLI
./build/mlc_chat_cli --model gorilla-weights
Example Output:
(sudeep-mlc-llm) ./build/mlc_chat_cli --model gorilla-weights
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/ndarray-cache.json"
Use model library: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/gorilla-weights-q3f16_0-vulkan.so"
Loading model...
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
USER: I want to generate image from text.
ASSISTANT: <<<domain>>>: Multimodal Image-to-Image Generation
<<<api_call>>>: ControlNetModel.from_pretrained('lllyasviel/controlNet')
<<<api_provider>>>: Hugging Face
<<<explanation>>>:1. First, import the necessary libraries and models from Hugging Face and others.2. Use the ControlNetModel to create a control net that can be used to generate images from text.3. The control net is a neural network that is pre-trained to generate images conditionally.4. Load an image and apply a mask to it to create a controlled image that can be used for research/educational purposes.5. The controlled image can be saved as a file (e.g., .png file) to be used in an application.
Do I need to make any other specific changes for Android and/or iOS?
Would be great if we can validate if the model lib can be reused via vicuna-q3f16_0, if so, we should add --reuse-lib flag or something similar so we do not need to distributed specific model, and only weight is needed.
Once we do that, it should be supported out of box from WebLLM
Another item that would be relevant here for future work is to remove the name based model matching here and instead use matching of config.json, so future addition of models should work out of box without changing build
This is awesome! Thanks for the PR!
Hi @sudeepag , would you mind fixing the supported_model_types as @MasterJH5574 mentioned so that we can merge this?
@yzh119 Sorry, was out for a few days. I've addressed the comment in https://github.com/mlc-ai/mlc-llm/pull/288/commits/33a9172aa8b9a5e374ed7e294c3917121498bf87.
@tqchen I confirmed that reuse-lib is working as expected. Here are my steps:
- Confirm vicuna-v1-7b lib is present locally
(sudeep-mlc-llm) ls dist/vicuna-v1-7b-q3f16_0
debug mod_cache_before_build_vulkan.pkl params vicuna-v1-7b-q3f16_0-vulkan.so
- Build using gorilla weights + reuse vicuna-v1-7b lib
(sudeep-mlc-llm) python build.py --model /home/sudeepag/gorilla-weights --target vulkan --reuse-lib vicuna-v1-7b-q3f16_0
Using path "/home/sudeepag/gorilla-weights" for model "gorilla-weights"
Database paths: ['log_db/rwkv-raven-7b', 'log_db/rwkv-raven-3b', 'log_db/vicuna-v1-7b', 'log_db/redpajama-3b-q4f16', 'log_db/redpajama-3b-q4f32', 'log_db/rwkv-raven-1b5', 'log_db/dolly-v2-3b']
Target configured: vulkan -keys=vulkan,gpu -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -supports_16bit_buffer=1 -supports_8bit_buffer=1 -supports_float16=1 -supports_float32=1 -supports_int16=1 -supports_int32=1 -supports_int8=1 -supports_storage_buffer_storage_class=1 -thread_warp_size=1
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_89 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 2.8401594161987305 GB
Start storing to cache dist/gorilla-weights-q3f16_0/params
[0519/0519] saving param_518
All finished, 130 total shards committed, record saved to dist/gorilla-weights-q3f16_0/params/ndarray-cache.json
Save a cached module to dist/gorilla-weights-q3f16_0/mod_cache_before_build_vulkan.pkl.
Dump static shape TIR to dist/gorilla-weights-q3f16_0/debug/mod_tir_static.py
Dump dynamic shape TIR to dist/gorilla-weights-q3f16_0/debug/mod_tir_dynamic.py
Reuse existing prebuilt lib {ARGS.reuse_lib}...
Finish exporting chat config to dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json
free(): invalid pointer
Aborted (core dumped)
- Confirm CLI is working correctly.
(sudeep-mlc-llm) ./build/mlc_chat_cli --model gorilla-weights
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/mlc-chat-config.json"
Use model weights: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/params/ndarray-cache.json"
Use model library: "/home/sudeepag/mlc-llm/dist/gorilla-weights-q3f16_0/gorilla-weights-q3f16_0-vulkan.so"
Loading model...
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
USER: I want to generate image from text.
ASSISTANT: <<<domain>>>: Multimodal Image-to-Image Generation
<<<api_call>>>: ControlNetModel.from_pretrained('lllyasviel/controlNet')
<<<api_provider>>>: Hugging Face
<<<explanation>>>:1. First, import the necessary libraries and models from Hugging Face and others.2. Use the ControlNetModel to create a control net that can be used to generate images from text.3. The control net is a neural network that is pre-trained to generate images conditionally.4. Load an image and apply a mask to it to create a controlled image that can be used for research/educational purposes.5. The controlled image can be saved as a file (e.g., .png file) to be used in an application.