TSIT icon indicating copy to clipboard operation
TSIT copied to clipboard

Other GPU ids throw error

Open sandeepjangir07 opened this issue 4 years ago • 4 comments

When using any other GPU devices ID, except 0, the code throws error. " Traceback (most recent call last): File "test.py", line 12, in opt = TestOptions().parse() File "/home/jang_sa/phd/AI/domain_adaptation/TSIT/options/base_options.py", line 178, in parse torch.cuda.set_device(opt.gpu_ids[0]) File "/home/jang_sa/Software/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/cuda/init.py", line 263, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal " The GPUs are available and device IDs are valid but still error is got !! any solution for this problem ?

sandeepjangir07 avatar Jul 05 '21 13:07 sandeepjangir07

It seems working on my side. How many GPUs do you have?

EndlessSora avatar Jul 06 '21 03:07 EndlessSora

It seems working on my side. How many GPUs do you have?

I have two GPU clusters. One with 8 GPUs and one with 5 but whenever I use CUDA_VISIBLE_DEVICES=[anything other than 0] and gpu_id=(anything other than 0) I get this error !! I think today, I will try to sit and debug it but if you have any hint of whats causing this, i would be very helpful.

thanks

sandeepjangir07 avatar Jul 06 '21 09:07 sandeepjangir07

For example, when you modify the --gpu_ids 0 here to --gpu_ids 1, will it cause an error?

EndlessSora avatar Jul 06 '21 10:07 EndlessSora

For example, when you modify the --gpu_ids 0 here to --gpu_ids 1, will it cause an error?

Hi, Yes, I cannot do inference on other GPUs as well.

sandeepjangir07 avatar Jul 06 '21 13:07 sandeepjangir07