minigpt4.cpp
minigpt4.cpp copied to clipboard
How to accelerate inference?
Hi,
I enabled the cublas compilation option.
The problem is that not charge o process all in GRAM memory?
What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?
Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does.
Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work.
@Maknee
I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt.
But when i run cmake --build . --config Release,
i get this error below unfortunately : -
Any advice to deal with is highly appreciated