mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Universal LLM Deployment Engine with ML Compilation

Results 578 mlc-llm issues
Sort by recently updated
recently updated
newest added

Hi Web-LLM team, can't say I have had this much fun in years. My demo is here https://hpssjellis.github.io/my-examples-of-ai-agents/public/web-llm/deepseek-r1-00.html I want to stop and restart a stream mid chat. I can't...

question

## ❓ General Questions **Steps to reproduce the behavior:** mlc_llm serve --model-lib /mnt/data/ehdd1/home/models/mlc/libs/Llama-2-7b-chat-hf-q0f16-O0-cuda.so /mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/ python test.py **test.py as following:** ``` import requests import json MLC_SERVER_URL = "http://127.0.0.1:8000/v1/completions" request_payload = {...

question

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): (https://huggingface.co/microsoft/Phi-4-mini-instruct) - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) ## Additional...

new-models

## ❓ General Questions When using MLC LLM with ROCM on a Radeon 7900xtx I am noticing a very large time to first token. With context lengths around 4k I'm...

question

Hi, I've been exploring the flashinfer implementation and noticed some constraints in dispatch_kv_cache_creation.py: https://github.com/mlc-ai/mlc-llm/blob/b636b2ac5e0c8bac6cf2a5427c3380fff856447e/python/mlc_llm/compiler_pass/dispatch_kv_cache_creation.py#L200-L221 Could you help me understand: - The technical rationale behind these limitations (head_dim, group size) ?...

I installed the mlc app by compiling from scratch according to the [documentation](https://llm.mlc.ai/docs/deploy/android.html). Now after I have downloaded the llama model, when I enter the chat UI, it does not...

bug

## ❓ General Questions add the ability to load other models, except for those that are by default. Make a choice from the local storage. Is it possible to somehow...

question

## ❓ General Questions While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot. For example, if I...

question

## 🐛 Bug I execute the following scripts (from [https://llm.mlc.ai/docs/install/mlc_llm.html#option-2-build-from-source]()) to build the MLC-llm from the source code, but it fails. ## To Reproduce ``` # clone from GitHub git...

bug

## ❓ General Questions Hello, I tried the official function call demo, tool_calls can be returned normally, tool_calls returned were None when I replaced mlc-ai/gorilla- Openfunctions-V1-q4F161-mlC with mlc-ai/Llama-3.1-8B-Instruct- Q4F161-mlC why,...

question