LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

Is it possible to run at CPU?

Open plmsuper8 opened this issue 1 year ago • 1 comments

Thanks for the great work.

I have a server with large RAM but no GPU, and another 16G VRAM local PC. Unfortunately,both of them seem insufficient.

Sorry, but I’m a newbie for this. When I try to modify the code (remove .cuda(), and set device=cpu), it crashes . I also tried CLI (cpu only), it's working but not multimodal.

Besides, is it support load_in_8bits or quant to 4bit like other LLama based model? Thanks again!

plmsuper8 avatar Apr 20 '23 00:04 plmsuper8

It's possible to run on CPU, you'll need >=60GB of RAM, and i tried FP16, didn't work, some layers don't have half float implementation, so had to use FP32.

satyajitghana avatar Apr 20 '23 06:04 satyajitghana

Hi, I am closing this issue, due to the inactivity. Hope your problem has been resolved. If you have further concerns, please feel free to re-open or open a new issue. Thanks!

haotian-liu avatar May 01 '23 04:05 haotian-liu

@haotian-liu I think the inference by defaults expects a gpu with .device(cuda). Can we add cpu option (something like if gpu is absent use cuda) and mention in the readme how much ram cpu inference would take. A cu flag would be relly helpful for wider usage.

copperwiring avatar May 11 '23 08:05 copperwiring