mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
## ❓ General Questions Hello, I encountered an issue while deploying using mlc_llm in cpp. The model is using Qwen2.5-0.5B. kv_cache is created using "creat_tir_cged_kv_cache". When performing a prefill, it...
## ❓ General Questions I am curious if there is a difference between the quantization methods, such as `q4f16_0` and `q4f32_0` of this engine, and the `q4_0` quantization of other...
## ❓ General Questions Hi, I'd love to know about trends in different quantization methods supported by MLC For example (I made this up) , ``` slowest-fastest: q0f32, q3f16_0, q4f16_0,...
## 🐛 Bug I'm trying to replicate the LLaMA example method as mentioned in introduction documentation gives errors related to relax.build inspite of properly configured pipeline. Vulkan drivers are installed...
## 🐛 Bug When using the `mlc-llm` Swift package to chat with vision language models, specifically the `Phi-3-vision-instruct` model, errors occur when attempting to input an image for the second...
## 🚀 Feature I'm wondering if you plan on providing a C++ (or C#) API. ## Motivation Many programs that operate on device use C++ or C#. This would enable...
## 🐛 Bug I'm trying to use `mlc-llm` to run cohere's `aya` 8b models. The model compiles and runs normally, but it seems generate weird answers: especially 1. it seems...
## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/blog/falcon - Is this model architecture supported by MLC-LLM? - A: I cannot determine if the...
## ⚙️ Request New Models - Link to an existing implementation (e.g. Hugging Face/Github): [Huggingface](https://huggingface.co/openbmb/MiniCPM3-4B) - Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) No ##...
## ❓ General Questions How to know the generation speed in serve mode? Can it display relevant information in the console like chat mode?