mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Universal LLM Deployment Engine with ML Compilation

Results 578 mlc-llm issues
Sort by recently updated
recently updated
newest added

All finished, 163 total shards committed, record saved to dist/open-llama-plus-7b_0515-q4f32_0/params/ndarray-cache.json Save a cached module to dist/open-llama-plus-7b_0515-q4f32_0/mod_cache_before_build_cuda.pkl. Dump static shape TIR to dist/open-llama-plus-7b_0515-q4f32_0/debug/mod_tir_static.py Dump dynamic shape TIR to dist/open-llama-plus-7b_0515-q4f32_0/debug/mod_tir_dynamic.py - Dispatch...

bug

This PR adds support for [Gorilla](https://arxiv.org/pdf/2305.15334.pdf), which is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. Steps: 1) Download Gorilla delta weights ```shell mkdir...

## 🚀 Feature use vulkan as a fallback backend on android ## Motivation some android device may not have proper opencl support. add vulkan fallback will add compatibility to these...

feature request

When I run "python build.py --model ./dist/models/vicuna-7b --quantization q4f16_0 --target android --max-seq-len 768", I got an issue like " [18:13:12] /Users/wenkeyu1/Desktop/mlc-llm/tvm-unity/src/target/llvm/llvm_module.cc:418: Architecture mismatch: module=arm64-apple-macos host=x86_64-apple-darwin22.3.0 Traceback (most recent call last):...

trouble shooting

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/TheBloke/guanaco-33B-GGML - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://mlc.ai/mlc-llm/docs/model-prebuilts.html#off-the-shelf-models)) Yes ##...

new-models

1. Restore tvm_runtime.h file 2. Revert code that reads ndk-build path from local.properties

## 🐛 Bug ## To Reproduce 1. python3 build.py --hf-path databricks/dolly-v2-3b --quantization q3f16_0 **(It is OK!)** 2. Build the CLI : cd build cmake .. make ## Expected behavior [...

bug

## 🐛 Bug I managed to get the python server (located under mlc-llm/python) to work by first building both tvm and mlc-llm cli from source, and then running the command:...

bug