mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
Based on experimenting with GPTQ-for-LLaMa, int4 quantization seems to introduce 3-5% drop in perplexity, while int8 is almost identical to fp16. Would it be possible to use int8 quantization with...
will there ever be an option where you can just, run a normal .exe file, without having to run all these commands and stuff? if there is one could it...
Hello there, I found this problem while executing the sample code given for installation on Macbook M1. How should I resolve this? ``` An error occurred during the execution of...
Incredible project, i managed to run the model with good speed on my hardware (AMD) thanks. I have a question do you have any plans to offload the weights and...
Hey there, congratulations on a great release! The app works great on a Mac and the installation was very straightforward. Do you have plans for growing the `mlc_chat_cli` into a...
The 4090 can't run the 65B model. Can I run it on the macbook with this?
I can't understand why following the path of ChatGPT about restrictions on what it can or cannot say, how can I disable the usual "As na AI language model, I...
This PR supports compilation and deployment of MOSS model, especially moss-moon-003-sft.
I have this error when I ` git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b` > Error downloading object: float16/params_shard_1.bin (0fb70c2): Smudge error: Error downloading float16/params_shard_1.bin (0fb70c297b47ce4ecade5f7875c4c90f518069bab49f359a1644766b2279e8e2): batch response: Post "https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3.git/info/lfs/objects/batch": dial tcp: lookup...