mlc-llm
mlc-llm copied to clipboard
Does mlc-llm support nv gpu using CUDA instead Vulkan
I notice that mlc-llm has supported nv gpu by Vulkan. Does mlc-llm support nv gpu using CUDA instead Vulkan? I guess nv prefers CUDA than Vulkan so CUDA will be faster than Vulkan?
Of course MLC-LLM supports CUDA, the reason we release Vulkan versions is it's general and supported by more backends to NV GPUs. You can compile the CUDA version manually by running
python build.py --model MODEL_NAME --target cuda ...
MLC-LLM supports CUDA, the reason we release Vu
Thanks, I will have a try.
btw, you need to compile (with USE_CUDA set to ON) and install the relax project before running this build.py script.
@yzh119 does cuda got a faster speed to vulkan after both tuned?
btw, you need to compile (with
USE_CUDAset toON) and install the relax project before running thisbuild.pyscript.
What is the difference between relax project and tvm unity branch?
btw, you need to compile (with
USE_CUDAset toON) and install the relax project before running thisbuild.pyscript.What is the difference between relax project and tvm unity branch?
Hi, according to the documentation:
"Install TVM Unity. We have some local changes to TVM Unity, so please try out the mlc/relax repo for now. We will migrate change back to TVM Unity soon."
mlc-ai/relax has some commits ahead compared to upstream (https://github.com/apache/tvm/tree/unity). Use the repo https://github.com/mlc-ai/relax.git for your DIY please.
The first three steps of this build instruction is applicable to your situation.
some notes:
- [in step 1] change config
USE_CUDAset toONincmake/config.cmake - [in step 3] change
--target=iphoneto--target=cudaas for your situation
奇怪的是,我的笔记本电脑没有安装Vulkan驱动,而是安装的CUDA驱动。运行MLC Chat也没有问题。 是不是我哪里操作不对?会不会是我在运行MLC Chat时没有跑在显卡上而是跑在CPU上? Strangely, my laptop does not have the Vulkan driver installed, but CUDA driver installed. There is also no problem running MLC Chat. Is there something wrong with me? Could it be that I am not running on the NV GPU but on the CPU when I run MLC Chat?
most recent gpu drivers usually automatically comes with vulkan support
We haven't particularly focused a lot on CUDA optimization yet, as torch.compile() should already work for huggingface models out of the box. Happy to bring in more optimization later
cuda got a faster speed to vulkan after both tuned?
Thanks a lot. I have got the vicuna-v1-7b_cuda_float16.so successfully. But error happened when building mlc-llm-cli. Could you please have a look? @yzh119 Detail infor can be found in #119
Close it as the discussion continues in #119 .