mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Does mlc-llm support nv gpu using CUDA instead Vulkan

Open zhaoyang-star opened this issue 2 years ago • 10 comments

I notice that mlc-llm has supported nv gpu by Vulkan. Does mlc-llm support nv gpu using CUDA instead Vulkan? I guess nv prefers CUDA than Vulkan so CUDA will be faster than Vulkan?

zhaoyang-star avatar May 08 '23 04:05 zhaoyang-star

Of course MLC-LLM supports CUDA, the reason we release Vulkan versions is it's general and supported by more backends to NV GPUs. You can compile the CUDA version manually by running

python build.py --model MODEL_NAME --target cuda ...

yzh119 avatar May 08 '23 04:05 yzh119

MLC-LLM supports CUDA, the reason we release Vu

Thanks, I will have a try.

zhaoyang-star avatar May 08 '23 04:05 zhaoyang-star

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

yzh119 avatar May 08 '23 04:05 yzh119

@yzh119 does cuda got a faster speed to vulkan after both tuned?

lucasjinreal avatar May 08 '23 05:05 lucasjinreal

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

What is the difference between relax project and tvm unity branch?

zhaoyang-star avatar May 08 '23 08:05 zhaoyang-star

btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.

What is the difference between relax project and tvm unity branch?

Hi, according to the documentation:

"Install TVM Unity. We have some local changes to TVM Unity, so please try out the mlc/relax repo for now. We will migrate change back to TVM Unity soon."

mlc-ai/relax has some commits ahead compared to upstream (https://github.com/apache/tvm/tree/unity). Use the repo https://github.com/mlc-ai/relax.git for your DIY please.

The first three steps of this build instruction is applicable to your situation.

some notes:

  1. [in step 1] change config USE_CUDA set to ON in cmake/config.cmake
  2. [in step 3] change --target=iphone to --target=cuda as for your situation

shiqimei avatar May 08 '23 12:05 shiqimei

奇怪的是,我的笔记本电脑没有安装Vulkan驱动,而是安装的CUDA驱动。运行MLC Chat也没有问题。 是不是我哪里操作不对?会不会是我在运行MLC Chat时没有跑在显卡上而是跑在CPU上? Strangely, my laptop does not have the Vulkan driver installed, but CUDA driver installed. There is also no problem running MLC Chat. Is there something wrong with me? Could it be that I am not running on the NV GPU but on the CPU when I run MLC Chat?

liyizhe3975 avatar May 08 '23 14:05 liyizhe3975

most recent gpu drivers usually automatically comes with vulkan support

tqchen avatar May 08 '23 15:05 tqchen

We haven't particularly focused a lot on CUDA optimization yet, as torch.compile() should already work for huggingface models out of the box. Happy to bring in more optimization later

junrushao avatar May 08 '23 23:05 junrushao

cuda got a faster speed to vulkan after both tuned?

Thanks a lot. I have got the vicuna-v1-7b_cuda_float16.so successfully. But error happened when building mlc-llm-cli. Could you please have a look? @yzh119 Detail infor can be found in #119

zhaoyang-star avatar May 11 '23 06:05 zhaoyang-star

Close it as the discussion continues in #119 .

yzh119 avatar May 23 '23 18:05 yzh119