mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
## Instructions: 1. Clone https://huggingface.co/openlm-research/open_llama_7b_700bt_preview to local, 2. Link the cloned repo to `dist/models/open-llama-700bt-7b` 3. run `python3 build.py --debug-dump --model open-llama-700bt-7b --use-cache=0 --quantization q3f16_0` 4. run `./build/mlc_chat_cli --local-id open-llama-700bt-7b-q3f16_0` Then...
This PR adds the following: 1) A Python chat module with the same functionality defined in the CLI (note that this requires a module without tvm_runtime dependency, see changes to...
Just FYI: tested your TestFlight suite and it works just fine on my iPad Air (4. Generation) That's just 4GB RAM. is there a benchmark prompt or something? 
Hi, this pull request introduces support for a novel quantization method: GPTQ. This addition is driven by the observation that GPTQ demonstrates acceptable performance under lower bit representations, and tends...
Dear How to build the "cpp" dir as a stand alone executable bin for Android ? Thanks
I tried to run the command line tools on the android system with cl enabled. I followed the instructions README file under android folder. But when I run the tools,...
I would suggest using Cmake to organize the C++ code in an Android project instead of using ndk-build.
I build the model ok, but don't know how to run it using python. python tests/chat.py ??? how to config it? It runs fail.
Hey all, ## TLDR I managed to build and deploy the `dolly-v2-3b` model to iOS `iPad Pro M1` (thanks to the helpful suggestions in #129 and #116). However, I noticed...