mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Universal LLM Deployment Engine with ML Compilation

Results 578 mlc-llm issues
Sort by recently updated
recently updated
newest added

## ❓ General Questions Hello, I encountered an issue while deploying using mlc_llm in cpp. The model is using Qwen2.5-0.5B. kv_cache is created using "creat_tir_cged_kv_cache". When performing a prefill, it...

question

## ❓ General Questions I am curious if there is a difference between the quantization methods, such as `q4f16_0` and `q4f32_0` of this engine, and the `q4_0` quantization of other...

question

## ❓ General Questions Hi, I'd love to know about trends in different quantization methods supported by MLC For example (I made this up) , ``` slowest-fastest: q0f32, q3f16_0, q4f16_0,...

question

## 🐛 Bug I'm trying to replicate the LLaMA example method as mentioned in introduction documentation gives errors related to relax.build inspite of properly configured pipeline. Vulkan drivers are installed...

bug

## 🐛 Bug When using the `mlc-llm` Swift package to chat with vision language models, specifically the `Phi-3-vision-instruct` model, errors occur when attempting to input an image for the second...

bug

## 🚀 Feature I'm wondering if you plan on providing a C++ (or C#) API. ## Motivation Many programs that operate on device use C++ or C#. This would enable...

feature request

## 🐛 Bug I'm trying to use `mlc-llm` to run cohere's `aya` 8b models. The model compiles and runs normally, but it seems generate weird answers: especially 1. it seems...

bug

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/blog/falcon - Is this model architecture supported by MLC-LLM? - A: I cannot determine if the...

new-models

## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): [Huggingface](https://huggingface.co/openbmb/MiniCPM3-4B) - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) No ##...

new-models

## ❓ General Questions How to know the generation speed in serve mode? Can it display relevant information in the console like chat mode?

question