MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Loading the model on multiple GPUs

Open aamir-gmail opened this issue 2 years ago • 18 comments

I have two 4090 24GB, if possible please provide an extra argument to demo.py to either load the model on CPU or 2 or more GPU and another argument to run on 16-bit and take advantage of extra GPU RAM, instead of editing config files.

aamir-gmail avatar Apr 19 '23 08:04 aamir-gmail

I also would like to know how to do this? I have 2x3060 12gb so I could load the 13b model but it doesn't seem to be implemented

CyberTimon avatar Apr 21 '23 13:04 CyberTimon

I have same request.

taomanwai avatar Apr 27 '23 08:04 taomanwai

I have same request too.

wJc-cn avatar May 06 '23 00:05 wJc-cn

  1. Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()

  2. Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')

It can run on two RTX 2080Ti in my computer.

thcheung avatar Jun 06 '23 15:06 thcheung

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained() Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda') It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

sinsauzero avatar Jun 07 '23 06:06 sinsauzero

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained() Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda') It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

(1) Load the LLaMA with device map to 'auto':

https://github.com/Vision-CAIR/MiniGPT-4/blob/22d8888ca2cf0aac862f537e7d22ef5830036808/minigpt4/models/mini_gpt4.py#L94

device_map = 'auto'

(2) Modify the line below from 'cuda:{}'.format(args.gpu_id)' to 'cuda', It will automatically assign to device0 or device1 if you have two devices:

https://github.com/Vision-CAIR/MiniGPT-4/blob/22d8888ca2cf0aac862f537e7d22ef5830036808/demo.py#L64

chat = Chat(model, vis_processor, device='cuda' )

(3) The "to device" can be removed from the line below because llama has been loaded to GPUs automatically:

https://github.com/Vision-CAIR/MiniGPT-4/blob/22d8888ca2cf0aac862f537e7d22ef5830036808/demo.py#L60

model = model_cls.from_config(model_config)

(4) When encode the image, we may encode the image with CPU and assign the image embedding to GPU

https://github.com/Vision-CAIR/MiniGPT-4/blob/22d8888ca2cf0aac862f537e7d22ef5830036808/minigpt4/conversation/conversation.py#L185 https://github.com/Vision-CAIR/MiniGPT-4/blob/22d8888ca2cf0aac862f537e7d22ef5830036808/minigpt4/conversation/conversation.py#L186

image_emb, _ = self.model.encode_img(image.to('cpu'))
img_list.append(image_emb.to('cuda'))

The model should now work if you have multiple GPUs with low memory space.

image

thcheung avatar Jun 07 '23 06:06 thcheung

Traceback (most recent call last): File "/home2/jainit/MiniGPT-4/demo.py", line 61, in model = model_cls.from_config(model_config) File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 243, in from_config model = cls( File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 90, in init self.llama_model = LlamaForCausalLM.from_pretrained( File "/home2/jainit/torchy/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2722, in from_pretrained max_memory = get_balanced_memory( File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory max_memory = get_max_memory(max_memory) File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 624, in get_max_memory _ = torch.tensor([0], device=i) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I did all of these steps but i still get

JainitBITW avatar Aug 08 '23 14:08 JainitBITW

@JainitBITW Is it working now for you?

sushilkhadkaanon avatar Sep 19 '23 09:09 sushilkhadkaanon

Yes i just restarted my cuda.

JainitBITW avatar Sep 19 '23 10:09 JainitBITW

@JainitBITW Did you do anything apart from @thcheung 's instruction? Thanks anyway!

sushilkhadkaanon avatar Sep 19 '23 10:09 sushilkhadkaanon

Nope exactly same

JainitBITW avatar Sep 19 '23 10:09 JainitBITW

What error you are getting

JainitBITW avatar Sep 19 '23 10:09 JainitBITW

I'm trying to run the 13 B model on multiple GPUs. The author has written they currently don't support multi-GPU inference. So , I want to be sure that it's possible to do inference on multiple GPUs before provisioning the ec2 instance.

sushilkhadkaanon avatar Sep 19 '23 10:09 sushilkhadkaanon

I think you van go ahead

JainitBITW avatar Sep 19 '23 10:09 JainitBITW

@JainitBITW @thcheung thanks it worked for me (8 bit). Have any idea how to do it for 16 bit (low resource = False) ? It is throwing this error: RuntimeError: Input type (float) and bias type (c10::Half) should be the same

sushilkhadkaanon avatar Sep 19 '23 12:09 sushilkhadkaanon

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I got through this error by setting vit_precision: "fp32" in minigpt_v2.yaml, but I didn't figure out what would need to be done to get the new input to also be fp16 (half precision) instead of making everything fp32.

daniellandau avatar Oct 16 '23 17:10 daniellandau

My solution is: CUDA_VISIBLE_DEVICES=1 python demo_v2.py --cfg-path eval_configs/minigptv2_eval.yaml --gpu-id 0

uiyo avatar Nov 13 '23 09:11 uiyo