mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Universal LLM Deployment Engine with ML Compilation

Results 578 mlc-llm issues
Sort by recently updated
recently updated
newest added

## Instructions: 1. Clone https://huggingface.co/openlm-research/open_llama_7b_700bt_preview to local, 2. Link the cloned repo to `dist/models/open-llama-700bt-7b` 3. run `python3 build.py --debug-dump --model open-llama-700bt-7b --use-cache=0 --quantization q3f16_0` 4. run `./build/mlc_chat_cli --local-id open-llama-700bt-7b-q3f16_0` Then...

This PR adds the following: 1) A Python chat module with the same functionality defined in the CLI (note that this requires a module without tvm_runtime dependency, see changes to...

Just FYI: tested your TestFlight suite and it works just fine on my iPad Air (4. Generation) That's just 4GB RAM. is there a benchmark prompt or something? ![9252DFB8-6A34-4B94-84A9-CCD064367A6B](https://github.com/mlc-ai/mlc-llm/assets/69374354/d2808b73-c263-42c7-8d10-d95ae130d9a3)

Hi, this pull request introduces support for a novel quantization method: GPTQ. This addition is driven by the observation that GPTQ demonstrates acceptable performance under lower bit representations, and tends...

Dear How to build the "cpp" dir as a stand alone executable bin for Android ? Thanks

documentation
android

I tried to run the command line tools on the android system with cl enabled. I followed the instructions README file under android folder. But when I run the tools,...

documentation
android

Can you provide the Android Maven SDK format?

help wanted

I would suggest using Cmake to organize the C++ code in an Android project instead of using ndk-build.

enhancement

I build the model ok, but don't know how to run it using python. python tests/chat.py ??? how to config it? It runs fail.

feature request

Hey all, ## TLDR I managed to build and deploy the `dolly-v2-3b` model to iOS `iPad Pro M1` (thanks to the helpful suggestions in #129 and #116). However, I noticed...