Can't run on CPU
I try to run the model on CPU in offline mode. But it depends on a package flash_attn, which needs to be compiled with nvcc on GPU.
I am wondering can we run this model on CPU.
This model has a total of 26B, is it difficult to run with the CPU? For flash_attn, you can turn it off by modifying config.json in the model
Set L20 to "attn_implementation": "eager",
https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/config.json#L20
Set L200 to "use_flash_attn": false https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/config.json#L200
I will try it. Thanks. If the dependency problem fixed, it should run on CPU successfully although it may be slow.
Hi, have you solved this? I have modified the config.josn file, but it now says:
ValueError: At least one of the model submodule will be offloaded to disk, please pass along an offload_folder.
This is all I'm trying to run: from lmdeploy import pipeline from lmdeploy.vl import load_image pipe = pipeline('InternVL-Chat-V1-5')