mlc-llm issues

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

6

## ❓ General Questions Hello, I encountered an issue while deploying using mlc_llm in cpp. The model is using Qwen2.5-0.5B. kv_cache is created using "creat_tir_cged_kv_cache". When performing a prefill, it...

ifndefendif

question

[Question] Difference between the quantization methods of other LLM engines.

## ❓ General Questions I am curious if there is a difference between the quantization methods, such as `q4f16_0` and `q4f32_0` of this engine, and the `q4_0` quantization of other...

BrandonLee0626

question

[Question] semantic description of different quantization methods

## ❓ General Questions Hi, I'd love to know about trends in different quantization methods supported by MLC For example (I made this up) , ``` slowest-fastest: q0f32, q3f16_0, q4f16_0,...

phgcha

question

[Bug] Llama example inference using Vulkan gives build error

4

## 🐛 Bug I'm trying to replicate the LLaMA example method as mentioned in introduction documentation gives errors related to relax.build inspite of properly configured pipeline. Vulkan drivers are installed...

asfarkTii

bug

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM

4

## 🐛 Bug When using the `mlc-llm` Swift package to chat with vision language models, specifically the `Phi-3-vision-instruct` model, errors occur when attempting to input an image for the second...

Neet-Nestor

bug

[Feature Request] Provide a C++ API

2

## 🚀 Feature I'm wondering if you plan on providing a C++ (or C#) API. ## Motivation Many programs that operate on device use C++ or C#. This would enable...

tranlm

feature request

[Bug] cohere model(aya) doesn't seem to produce the correct output

## 🐛 Bug I'm trying to use `mlc-llm` to run cohere's `aya` 8b models. The model compiles and runs normally, but it seems generate weird answers: especially 1. it seems...

jhlee525

bug

[Model Request] Falcon

6

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/blog/falcon - Is this model architecture supported by MLC-LLM? - A: I cannot determine if the...

juwalter

new-models

[Model Request] MiniCPM3

1

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): [Huggingface](https://huggingface.co/openbmb/MiniCPM3-4B) - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) No ##...

lin-calvin

new-models

[Question] How to get runtime stats in serve mode?

1

## ❓ General Questions How to know the generation speed in serve mode? Can it display relevant information in the console like chat mode?

rankaiyx

question

mlc-llm
mlc-llm copied to clipboard

Metadata

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

[Question] Difference between the quantization methods of other LLM engines.

[Question] semantic description of different quantization methods

[Bug] Llama example inference using Vulkan gives build error

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM

[Feature Request] Provide a C++ API

[Bug] cohere model(aya) doesn't seem to produce the correct output

[Model Request] Falcon

[Model Request] MiniCPM3

[Question] How to get runtime stats in serve mode?

← Metadata

Owner

Metadata

mlc-llm mlc-llm copied to clipboard

Metadata

← Metadata

Owner

Metadata

mlc-llm
mlc-llm copied to clipboard