HALC icon indicating copy to clipboard operation
HALC copied to clipboard

ERROR in running 'run_scripts/pope_eval.py'

Open anotherbricki opened this issue 1 year ago • 6 comments

Thanks for your good work! But I got confused when I ran evaluation of POPE.

At first, I run: python run_scripts/pope_eval.py --model llava-1.5 --data_path /home/duyuetian/COCO/val2014 -d vcd --pope_type random --num_images 100 --seed 0 --gpu_id 0 --output_dir ./generated_captions/ --noise_step 100

but it didn't work because of the error below: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

So I roughly solved it by adding the "CUDA_VISIBLE_DEVICES" param: CUDA_VISIBLE_DEVICES=0 python run_scripts/pope_eval.py --model llava-1.5 --data_path /home/duyuetian/COCO/val2014 -d vcd --pope_type random --num_images 100 --seed 0 --gpu_id 0 --output_dir ./generated_captions/ --noise_step 100

But it is not useful for more universal scenarios, so I wonder if there is a way to solve the problem at the root?Appreciate it.

anotherbricki avatar May 29 '24 05:05 anotherbricki

Thanks for your question! I also encounter the problem with LLaVA-1.5 on two devices. It might be due to some of the intermediate variables being accidentally cast to the other device when we develop the code base. However, there is no such issue for other three VLMs, and you can try to run them as well. I will check the code and let you know the updates. Thank!

BillChan226 avatar May 31 '24 09:05 BillChan226

Thanks for your question! I also encounter the problem with LLaVA-1.5 on two devices. It might be due to some of the intermediate variables being accidentally cast to the other device when we develop the code base. However, there is no such issue for other three VLMs, and you can try to run them as well. I will check the code and let you know the updates. Thank!

Thanks a lot!

anotherbricki avatar Jun 04 '24 03:06 anotherbricki

Hello! I have the same problem, is there a solution that will allow me to run LLaVA-1.5 normally?

Shenshen7 avatar Jun 26 '24 08:06 Shenshen7

I encountered the same issue. When I use the LLaVA-1.5 model and specify the GPU with CUDA_VISIBLE_DEVICES=0, I get a memory error: torch.cuda.OutOfMemoryError: CUDA out of memory. However, when I don't specify the GPU, I get the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_CUDA_addmm).

darwann avatar Sep 06 '24 08:09 darwann

I encountered the same issue. When I use the LLaVA-1.5 model and specify the GPU with CUDA_VISIBLE_DEVICES=0, I get a memory error: torch.cuda.OutOfMemoryError: CUDA out of memory. However, when I don't specify the GPU, I get the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_CUDA_addmm).

Hi, the OOM error is probably just because there is not enough memory. I will try to fix the cuda device issue this week.

BillChan226 avatar Sep 06 '24 08:09 BillChan226

Thanks! :)

darwann avatar Sep 20 '24 08:09 darwann