InternVL Can't run on CPU

I try to run the model on CPU in offline mode. But it depends on a package flash_attn, which needs to be compiled with nvcc on GPU. I am wondering can we run this model on CPU.

May 10 '24 10:05 shanzhou2186

This model has a total of 26B, is it difficult to run with the CPU? For flash_attn, you can turn it off by modifying config.json in the model

Set L20 to "attn_implementation": "eager",

https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/config.json#L20

Set L200 to "use_flash_attn": false https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/config.json#L200

May 16 '24 05:05 czczup

I will try it. Thanks. If the dependency problem fixed, it should run on CPU successfully although it may be slow.

May 22 '24 03:05 shanzhou2186

Hi, have you solved this? I have modified the config.josn file, but it now says: ValueError: At least one of the model submodule will be offloaded to disk, please pass along an offload_folder.

This is all I'm trying to run: from lmdeploy import pipeline from lmdeploy.vl import load_image pipe = pipeline('InternVL-Chat-V1-5')

Jun 20 '24 11:06 irene-crepax