mlc-llm Does mlc-llm support nv gpu using CUDA instead Vulkan

I notice that mlc-llm has supported nv gpu by Vulkan. Does mlc-llm support nv gpu using CUDA instead Vulkan? I guess nv prefers CUDA than Vulkan so CUDA will be faster than Vulkan?

May 08 '23 04:05 zhaoyang-star

Of course MLC-LLM supports CUDA, the reason we release Vulkan versions is it's general and supported by more backends to NV GPUs. You can compile the CUDA version manually by running

python build.py --model MODEL_NAME --target cuda ...

May 08 '23 04:05 yzh119

MLC-LLM supports CUDA, the reason we release Vu

Thanks, I will have a try.

May 08 '23 04:05 zhaoyang-star

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

May 08 '23 04:05 yzh119

@yzh119 does cuda got a faster speed to vulkan after both tuned?

May 08 '23 05:05 lucasjinreal

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

What is the difference between relax project and tvm unity branch?

May 08 '23 08:05 zhaoyang-star

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

What is the difference between relax project and tvm unity branch?

Hi, according to the documentation:

"Install TVM Unity. We have some local changes to TVM Unity, so please try out the mlc/relax repo for now. We will migrate change back to TVM Unity soon."

mlc-ai/relax has some commits ahead compared to upstream (https://github.com/apache/tvm/tree/unity). Use the repo https://github.com/mlc-ai/relax.git for your DIY please.

The first three steps of this build instruction is applicable to your situation.

some notes:

[in step 1] change config USE_CUDA set to ON in cmake/config.cmake
[in step 3] change --target=iphone to --target=cuda as for your situation

May 08 '23 12:05 shiqimei

奇怪的是，我的笔记本电脑没有安装Vulkan驱动，而是安装的CUDA驱动。运行MLC Chat也没有问题。是不是我哪里操作不对？会不会是我在运行MLC Chat时没有跑在显卡上而是跑在CPU上？ Strangely, my laptop does not have the Vulkan driver installed, but CUDA driver installed. There is also no problem running MLC Chat. Is there something wrong with me? Could it be that I am not running on the NV GPU but on the CPU when I run MLC Chat?

May 08 '23 14:05 liyizhe3975

most recent gpu drivers usually automatically comes with vulkan support

May 08 '23 15:05 tqchen

We haven't particularly focused a lot on CUDA optimization yet, as torch.compile() should already work for huggingface models out of the box. Happy to bring in more optimization later

May 08 '23 23:05 junrushao

cuda got a faster speed to vulkan after both tuned?

Thanks a lot. I have got the vicuna-v1-7b_cuda_float16.so successfully. But error happened when building mlc-llm-cli. Could you please have a look? @yzh119 Detail infor can be found in #119

May 11 '23 06:05 zhaoyang-star

Close it as the discussion continues in #119 .

May 23 '23 18:05 yzh119

mlc-llm mlc-llm copied to clipboard

Does mlc-llm support nv gpu using CUDA instead Vulkan

some notes:

mlc-llm
mlc-llm copied to clipboard