mlc-llm issues

Update Speed features on Mac

2

it performs slowly on Mac

type: trouble shooting

Enable weight compression in GPU

This PR enables weight compression in GPU. Previously the weight compression is run in CPU because the uncompressed weight is too large to fit in GPU, and running on CPU...

jinhongyii

very slow in Mac

4

Tried on a Mac with the below capacity and the response is very slow. Is there any way to speed it up? Spec: 2.6 GHz 6-Core Intel Core i7 Intel...

paulk8s

type: trouble shooting

Sorry for not already knowing this, but how can I load other models?

8

https://mlc.ai/mlc-llm/ I made those instructions work and can speak to vicuna-v1-7b but I'd like to mess with others. git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/lib Am I correct in...

Innomen

documentation

Runing mlc-llm python code on windows fail

2

Hi, tried to run python (gen my self), got some errors: ``` Check failed: (it != self_->idx_sub_.end()) is false: ``` I built tvm unity branch it import correctly, but runtime...

lucasjinreal

type: trouble shooting

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card

4

mlc_chat_cli --model dolly-v2-12b_int3 --dtype float32 Use lib /root/mlcai/dist/dolly-v2-12b_int3/float32/dolly-v2-12b_int3_cuda_float32.so Initializing the chat module... Finish loading You can use the following special commands: /help print the special commands /exit quit the cli...

myhyh

type: trouble shooting

Can you give the code and guide for all models conversion?

4

If you give the code of the model conversion, so that everyone can apply all the models according to the code and guide. ：）

ZSitong

type: documentation

Error: Vulkan Error, code=-3: VK_ERROR_INITIALIZATION_FAILED

1

mlc_chat_cli terminate called after throwing an instance of 'tvm::runtime::InternalError' what(): [03:45:52] /home/runner/work/utils/utils/tvm/src/runtime/vulkan/vulkan_instance.cc:144: --------------------------------------------------------------- An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html --------------------------------------------------------------- Check failed:...

txiang163

type: trouble shooting

can you supply more converted models?

2

https://huggingface.co/wshhyh/mlc_llm-dolly-v2-int4 i have tried to convert dolly,its env is very hard to configure,can you supply your converted models on huggingface for users to download?

myhyh

type: documentation

it can not stop when it speak Chinese

3

![image](https://user-images.githubusercontent.com/56100579/236104586-02bfd527-f00f-4081-b853-8cd189ce8fdc.png)

wuyuesong

type: trouble shooting

mlc-llm
mlc-llm copied to clipboard

Metadata

Update Speed features on Mac

Enable weight compression in GPU

very slow in Mac

Sorry for not already knowing this, but how can I load other models?

Runing mlc-llm python code on windows fail

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card

Can you give the code and guide for all models conversion?

Error: Vulkan Error, code=-3: VK_ERROR_INITIALIZATION_FAILED

can you supply more converted models?

it can not stop when it speak Chinese

← Metadata

Owner

Metadata

mlc-llm mlc-llm copied to clipboard

Metadata

← Metadata

Owner

Metadata

mlc-llm
mlc-llm copied to clipboard