mlc-llm issues

How to stop a stream?

8

Hi Web-LLM team, can't say I have had this much fun in years. My demo is here https://hpssjellis.github.io/my-examples-of-ai-agents/public/web-llm/deepseek-r1-00.html I want to stop and restart a stream mid chat. I can't...

hpssjellis

question

[Question] mlc-llm server cannot return correct logprobs

12

## ❓ General Questions **Steps to reproduce the behavior:** mlc_llm serve --model-lib /mnt/data/ehdd1/home/models/mlc/libs/Llama-2-7b-chat-hf-q0f16-O0-cuda.so /mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/ python test.py **test.py as following:** ``` import requests import json MLC_SERVER_URL = "http://127.0.0.1:8000/v1/completions" request_payload = {...

kunxiongzhu

question

[Model Request] phi-4-mini-instruct

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): (https://huggingface.co/microsoft/Phi-4-mini-instruct) - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) ## Additional...

j0h0k0i0m

new-models

Very slow time to first token on ROCM

5

## ❓ General Questions When using MLC LLM with ROCM on a Radeon 7900xtx I am noticing a very large time to first token. With context lengths around 4k I'm...

Jyers

question

Question about flashinfer constraints

Hi, I've been exploring the flashinfer implementation and noticed some constraints in dispatch_kv_cache_creation.py: https://github.com/mlc-ai/mlc-llm/blob/b636b2ac5e0c8bac6cf2a5427c3380fff856447e/python/mlc_llm/compiler_pass/dispatch_kv_cache_creation.py#L200-L221 Could you help me understand: - The technical rationale behind these limitations (head_dim, group size) ?...

mayakolad

[Bug] Android app does not take input; 'user 'role' is not defined' error

5

I installed the mlc app by compiling from scratch according to the [documentation](https://llm.mlc.ai/docs/deploy/android.html). Now after I have downloaded the llama model, when I enter the chat UI, it does not...

afsara-ben

bug

[Question]

3

## ❓ General Questions add the ability to load other models, except for those that are by default. Make a choice from the local storage. Is it possible to somehow...

alexdsh

question

[Question] While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot.

2

## ❓ General Questions While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot. For example, if I...

yangshgetui

question

[Bug] Compiling the MLC from source is failed (cuda_fp8.h)

4

## 🐛 Bug I execute the following scripts (from [https://llm.mlc.ai/docs/install/mlc_llm.html#option-2-build-from-source]()) to build the MLC-llm from the source code, but it fails. ## To Reproduce ``` # clone from GitHub git...

wwt02

bug

[Question] how to use function call

## ❓ General Questions Hello, I tried the official function call demo, tool_calls can be returned normally, tool_calls returned were None when I replaced mlc-ai/gorilla- Openfunctions-V1-q4F161-mlC with mlc-ai/Llama-3.1-8B-Instruct- Q4F161-mlC why,...

tebie6

question

mlc-llm
mlc-llm copied to clipboard

Metadata

How to stop a stream?

[Question] mlc-llm server cannot return correct logprobs

[Model Request] phi-4-mini-instruct

Very slow time to first token on ROCM

Question about flashinfer constraints

[Bug] Android app does not take input; 'user 'role' is not defined' error

[Question]

[Question] While waiting for the model's response on an Android phone, performing other operations may cause the phone to become unresponsive or reboot.

[Bug] Compiling the MLC from source is failed (cuda_fp8.h)

[Question] how to use function call

← Metadata

Owner

Metadata

mlc-llm mlc-llm copied to clipboard

Metadata

← Metadata

Owner

Metadata

mlc-llm
mlc-llm copied to clipboard