minigpt4.cpp How to accelerate inference?

How to accelerate inference?

Open dengtianbi opened this issue 1 year ago • 2 comments

Hi,

I enabled the cublas compilation option.

The problem is that not charge o process all in GRAM memory?

What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?

Sep 14 '23 11:09 dengtianbi

Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does.

Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work.

Sep 14 '23 21:09 Maknee

@Maknee

I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt.

But when i run cmake --build . --config Release,

i get this error below unfortunately : -

C__Windows_System32_cmd exe 24_11_2023 00_27_05

Any advice to deal with is highly appreciated

Nov 23 '23 19:11 deadpipe

minigpt4.cpp minigpt4.cpp copied to clipboard

How to accelerate inference?

minigpt4.cpp
minigpt4.cpp copied to clipboard