LLaVA
LLaVA copied to clipboard
Is it possible to run at CPU?
Thanks for the great work.
I have a server with large RAM but no GPU, and another 16G VRAM local PC. Unfortunately,both of them seem insufficient.
Sorry, but I’m a newbie for this. When I try to modify the code (remove .cuda(), and set device=cpu), it crashes . I also tried CLI (cpu only), it's working but not multimodal.
Besides, is it support load_in_8bits or quant to 4bit like other LLama based model? Thanks again!
It's possible to run on CPU, you'll need >=60GB of RAM, and i tried FP16, didn't work, some layers don't have half float implementation, so had to use FP32.
Hi, I am closing this issue, due to the inactivity. Hope your problem has been resolved. If you have further concerns, please feel free to re-open or open a new issue. Thanks!
@haotian-liu I think the inference by defaults expects a gpu with .device(cuda). Can we add cpu option (something like if gpu is absent use cuda) and mention in the readme how much ram cpu inference would take. A cu flag would be relly helpful for wider usage.