mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
## 🚀 Feature For some reason the input field has autocorrect turned off which makes typing experience worse than in other apps ## Motivation It is annoying when it gets...
## 🐛 Bug I want to profile operators on mobile devices. I've installed TVM PRC APP on device and successfully run `\tvm-unity\apps\android_rpc\tests\android_rpc_test.py`. However when try to run debug_compare.py, it failed...
## 🚀 Feature ## Motivation To do it multiplataforms easily. ## Alternatives ## Additional context
## ❓ General Questions I try to quantize qwne2 using q4f16_autoawq but get this error: ``` mlc_llm convert_weight /Qwen2-7B-Instruct/ --quantization q4f16_autoawq -o /Qwen2-7B-Instruct-q4f16_autoawq-MLC Traceback (most recent call last): File "/home/NLP/anaconda3/envs/mlc-chat-venv/bin/mlc_llm",...
## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2 - **Why?** With **7B size**, the functional calling performance is **On-par with GPT-4** See [BFCL...
Use multiple thread to load weights, cache and tokenizer, should slightly improve the initialization and TTFT time.
## 🐛 Bug When running `mlc_llm serve HF://mlc-ai/Some_MODEL_MLC` If internet is not available, it stuck at this line; ``` [2025-04-26 20:56:02] INFO auto_device.py:79: Found device: cuda:0 [2025-04-26 20:56:03] INFO auto_device.py:90:...
## 🐛 Bug tried to build mlc from source, following the exact steps reported on the docs. I am working on an NVIDIA jetson agx orin 64gb. When i try...
This PR supports tool function calls under strict format constraints. Specifically, it uses structural tag to constrain the calling format. It made following changes: - Add "tool_call_format" attribute in EngineConfig,...
## 🐛 Bug I tried to run Qwen2.5-Math-72B-Instruct-q4f16_1-MLC using Qwen2.5-Math-1.5B-Instruct-q4f16_1-MLC as draft model, but with problem. It is strange that it outputs the first word and then stops. As I...