mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Universal LLM Deployment Engine with ML Compilation

Results 578 mlc-llm issues
Sort by recently updated
recently updated
newest added

![Screenshot_MLCChat_20230510_162517](https://github.com/mlc-ai/mlc-llm/assets/102452767/bcc7ce17-c7c0-4fa8-bb3b-6c7478394619) 这是8gen2处理器数据。运行时gpu占满,话说用的是int8还是多少。他能否运行在npu上?如果能运行在npu上,使用int4量化说不定有更快的速度(第一款npu支持int4的处理器)

when i run mlc_chat_cli, occurs the error on mac m1 /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=Apple M1 Initializing the chat module... [15:54:34] /Users/jshao/Projects/mlc-ai-utils/tvm/src/runtime/relax_vm/vm.cc:768: --------------------------------------------------------------- An error occurred during the execution...

bug

When I try to build on my computer, run ```python3 build.py --model vicuna213123-v1-7b --dtype float16 --target iphone --quantization-mode int3 --quantization-sym --quantization-storage-nbit 16 --max-seq-len 768``` environment: CentOS Linux release 8.2.2.2004, NVIDIA-SMI...

documentation

By process: 1、Install TVM Unity and compile successfully 2、Get the model weight 3、Build the model to the library exist python3 build.py --model vicuna-v1-7b --type float16 --target iphone --quantization-mode int3 --quantization-sym...

documentation

I've run Vicuna7B successfully on Android device. I'm trying running https://huggingface.co/bigscience/bloomz model on my device. Could you provide some tips about adding support for BloomZ? Are there any videos about...

documentation

Recently we refactored [`conversation.py`](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) in FastChat to make it easier to register new conversation templates without using too many if. Could you also consider following our updates? Then we can...

enhancement

Hello, I am trying to reproduce your Vulkan compilation process for the `build.py` script on linux. The steps I've taken: - Install all the base requirements (MKL, Vulkan headers, SPIRV,...

trouble shooting

Dear I try to build model for android according to the guide. when run "" `python3 build.py --model vicuna-v1-7b --quantization q4f16_0 --target android --max-seq-len 768` I get below error: ```...

Which models are currently supported for deployment? Or how can other models be deployed through this project?

documentation