mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
This PR enables weight compression in GPU. Previously the weight compression is run in CPU because the uncompressed weight is too large to fit in GPU, and running on CPU...
Tried on a Mac with the below capacity and the response is very slow. Is there any way to speed it up? Spec: 2.6 GHz 6-Core Intel Core i7 Intel...
https://mlc.ai/mlc-llm/ I made those instructions work and can speak to vicuna-v1-7b but I'd like to mess with others. git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib Am I correct in...
Hi, tried to run python (gen my self), got some errors: ``` Check failed: (it != self_->idx_sub_.end()) is false: ``` I built tvm unity branch it import correctly, but runtime...
mlc_chat_cli --model dolly-v2-12b_int3 --dtype float32 Use lib /root/mlcai/dist/dolly-v2-12b_int3/float32/dolly-v2-12b_int3_cuda_float32.so Initializing the chat module... Finish loading You can use the following special commands: /help print the special commands /exit quit the cli...
If you give the code of the model conversion, so that everyone can apply all the models according to the code and guide. :)
mlc_chat_cli terminate called after throwing an instance of 'tvm::runtime::InternalError' what(): [03:45:52] /home/runner/work/utils/utils/tvm/src/runtime/vulkan/vulkan_instance.cc:144: --------------------------------------------------------------- An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html --------------------------------------------------------------- Check failed:...
https://huggingface.co/wshhyh/mlc_llm-dolly-v2-int4 i have tried to convert dolly,its env is very hard to configure,can you supply your converted models on huggingface for users to download?
