mlc-llm
mlc-llm copied to clipboard
open-llama-7b
Instructions:
- Clone https://huggingface.co/openlm-research/open_llama_7b_700bt_preview to local,
- Link the cloned repo to
dist/models/open-llama-700bt-7b - run
python3 build.py --debug-dump --model open-llama-700bt-7b --use-cache=0 --quantization q3f16_0 - run
./build/mlc_chat_cli --local-id open-llama-700bt-7b-q3f16_0
Then OpenLLaMa runs natively on your device.
Since OpenLLaMa is a pre-trained LM rather than a fine-tuned chatbot, it's not in fact chatting but simply autoregressively generating tokens given a prompt.
~/mlc-llm (fix ✗) ./build/mlc_chat_cli --local-id open-llama-700bt-7b-q3f16_0
Use MLC config: "dist/open-llama-700bt-7b-q3f16_0/params/mlc-chat-config.json"
Use model weights: "dist/open-llama-700bt-7b-q3f16_0/params/ndarray-cache.json"
Use model library: "dist/open-llama-700bt-7b-q3f16_0/open-llama-700bt-7b-q3f16_0-metal.so"
Loading model...
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
[23:28:16] /Users/houbohan/relax/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=Apple M1 Pro
Prompt: CMU is an university
LM: of technology and is therefore an institution of higher learning.
The university has a long tradition of being one of the leading technology universities in Germany and is thus one of the most important drivers of technological progress in the country.
The university also has a long tradition of being one of the most important drivers of technological progress in Germany.
CMU also has a long tradition of being one of the most important drivers of technological progress in Germany.
CMU is an university of technology and is therefore an institution of higher learning.
The university has a long tradition of being one of the most important drivers of technological progress in Germany and is therefore one of the most important drivers of technological progress in Germany.
CMU is an university of technology and is therefore an institution of higher learning and is therefore one of the most important drivers of technological progress in Germany.^C
Would be great to add a no_memory option to conv template. Which auto reset every time for the case of LM
Would be great to add a no_memory option to conv template. Which auto reset every time for the case of LM
@tqchen I now let it reset per prefill given its conv_template is LM