Junru Shao comments

Results 179 comments of


                                            Junru Shao

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card

Quantization plays an important role in memory reduction if you wanted to run a larger model with consumer-class GPUs, so please turn it on :-)

Error: Vulkan Error, code=-3: VK_ERROR_INITIALIZATION_FAILED

I believe Vulkan is supported according to #15. On the other hand, A100 is an extremely powerful GPU, so why not simply run Huggingface's pytorch models directly?

Speed benchmark compare with llama.cpp

The technical path we are using are quite different from Llama.cpp. MLC LLM primarily uses a compiler to generate efficient code targeting multiple CPU/GPU vendors, while Llama.cpp focuses on handcrafting....

An alternative python interface for MLC LLM

We will have tutorials on making use of MLC LLM APIs in Python/Javascript/Java/Swift, etc

An alternative python interface for MLC LLM

I actually think this interface is nice wrapper on top of `mlc_chat_cli`

Runing mlc-llm python code on windows fail

Yep please use this repo: https://github.com/mlc-ai/relax

Runing mlc-llm python code on windows fail

Windows should work in my experiments

can you supply more converted models?

To be clear, TVM Unity has both ROCm/Vulkan backend, which means we do not necessarily have to depend on ROCm like what Triton does. At the moment, I believe Vicuna-7b...

python chat.py can not be run

Please use `mlc_chat_cli` instead

CMake Error

Closing as the issue seems resolved :-)