MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Any idea if this will work on CPU?

Open spikespiegel opened this issue 2 years ago • 4 comments

First of all, thanks for this great project! The output quality seems very good, and the idea of running a multimodal model to work locally is awesome. It seems we already have a GPT-4 like multimodal model in our hands, so very exciting. I was wondering if it is possible to run with llama.cpp on CPU? I am currently running Vicuna-13b on CPU (the 4-bit quantized version) - around 8 GB Ram is enough. It works just fine, and the inference speed is about 1.5 tokens per second for my computer. (lt also seems to work on mobile phones with enough memory. I did not try it, but I saw a few examples). llama.cpp has their own file format (ggml), and provide a way to convert from original weights to ggml. It would be great if people with low VRAM or no VRAM can make it work on CPU. Any thoughts?

spikespiegel avatar Apr 18 '23 13:04 spikespiegel

yep,I need help to run on CPU

I am downloading and merging models

kenneth104 avatar Apr 20 '23 03:04 kenneth104

Hi! Has there been any progress on running it on a CPU? I'm really interested in this as well, since I don't have a powerful GPU. Any updates or workarounds you've discovered would be greatly appreciated. Thanks!

webnizam avatar Apr 20 '23 11:04 webnizam

yep,I need help to run on CPU

I am downloading and merging models

@kenneth104 Any progress of using CPU was made? Can you share?

rdkmaster avatar Apr 21 '23 09:04 rdkmaster

yep,I need help to run on CPU I am downloading and merging models

@kenneth104 Any progress of using CPU was made? Can you share?

noI can’t run on CPU, something need cuda and output error

kenneth104 avatar Apr 21 '23 14:04 kenneth104

if you want to run the demo on cpu, you need to use float32 and init all param on cpu.

1. change demo.py 
# model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)). 
model = model_cls.from_config(model_config).to('cpu')
# chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
chat = Chat(model, vis_processor, device='cpu')
2. change minigpt4.yaml 
# vit_precision: "fp16"
vit_precision: "fp32"
3. change minigpt4_eval.yaml 
#  low_resource: True
low_resource: False
4. change mini_gpt4.py, about line 90:
#  torch_dtype=torch.float16,
torch_dtype=torch.float32

that is all you need to do to use cpu run the demo.py, but the speed is very slow 

liyaozong1991 avatar Jun 29 '23 02:06 liyaozong1991

@liyaozong1991 thanks, I'll try this.

rdkmaster avatar Jun 29 '23 02:06 rdkmaster