DSNeRF icon indicating copy to clipboard operation
DSNeRF copied to clipboard

CUDA / CPU error~!

Open dedoogong opened this issue 4 years ago • 21 comments
trafficstars

Traceback (most recent call last): File "run_nerf.py", line 1019, in train() File "run_nerf.py", line 858, in train batch = next(raysRGB_iter).to(device) File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data index = self._next_index() # may raise StopIteration File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/usr/local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 226, in iter for idx in self.sampler: File "/usr/local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 124, in iter yield from torch.randperm(n, generator=generator).tolist() RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

maybe rgb or depth inputs needs to transfered to gpu first. right?

dedoogong avatar Jul 09 '21 11:07 dedoogong

I solved above but another errror

TRAIN views are [ 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48 49 51 52 53 54 55 56 57 58 59 61 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 101 102 103 104 105 106 107 108 109 111 112 113 114] TEST views are [ 0 10 20 30 40 50 60 70 80 90 100 110 115] VAL views are [ 0 10 20 30 40 50 60 70 80 90 100 110 115] 0%| | 0/20000 [00:00<?, ?it/sTraceback (most recent call last): File "run_nerf.py", line 1019, in train() File "run_nerf.py", line 922, in train **render_kwargs_train) File "run_nerf.py", line 137, in render all_ret = batchify_rays(rays, chunk, **kwargs) File "run_nerf.py", line 68, in batchify_rays ret = render_rays(rays_flat[i:i+chunk], **kwargs) File "run_nerf.py", line 419, in render_rays rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest) File "/workspace/DSNeRF/run_nerf_helpers.py", line 356, in raw2outputs dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1) # [N_rays, N_samples] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument tensors in method wrapper__cat)

dedoogong avatar Jul 09 '21 12:07 dedoogong

I also got the same 2 errors you got, don't remove the line : torch.set_default_tensor_type('torch.cuda.FloatTensor') ,instead remove this line: "batch = next(raysRGB_iter).to(device)" (for depth also), and use the same way nerf-pytorch implementation did it.

nishant34 avatar Jul 17 '21 05:07 nishant34

I also got the same 2 errors you got, don't remove the line : torch.set_default_tensor_type('torch.cuda.FloatTensor') ,instead remove this line: "batch = next(raysRGB_iter).to(device)" (for depth also), and use the same way nerf-pytorch implementation did it.

I still have problems after deleting these two lines. Can you share your code? 180226c1e5567f9eff9698439efabb2

SSground avatar Jul 23 '21 05:07 SSground

I also got the same 2 errors you got, don't remove the line : torch.set_default_tensor_type('torch.cuda.FloatTensor') ,instead remove this line: "batch = next(raysRGB_iter).to(device)" (for depth also), and use the same way nerf-pytorch implementation did it.

I have the same issue. Could you please elaborate how to modify it? Thanks!

Haodong-Alex-Yao avatar Jul 26 '21 04:07 Haodong-Alex-Yao

Could anyone provide the PyTorch version info? I cannot reproduce these errors. FYI, I'm using torch 1.7.1 on my machine.

dunbar12138 avatar Jul 26 '21 13:07 dunbar12138

I used 1.9 pytorch and encountered this problem. @nishant34 Yeah,but can this still conform to the author's algorithm? @SSground This is a simple Python syntax error.

BinuxLiu avatar Jul 27 '21 03:07 BinuxLiu

我用的是1.9 pytorch,遇到了这个问题。 @nishant34是的,但是这还能符合作者的算法吗? @SSground这是一个简单的 Python 语法错误。

Sorry, it’s the problem with my screenshots. My error is the same as the previous ones

SSground avatar Jul 27 '21 03:07 SSground

@dunbar12138 Yeah,there is no problem with torch in version 1.7.1. @SSground If you want use 1.9, you maybe need to cancell many code's comment to correct this log.

BinuxLiu avatar Jul 27 '21 03:07 BinuxLiu

Could anyone provide the PyTorch version info? I cannot reproduce these errors. FYI, I'm using torch 1.7.1 on my machine.

same issue here. torch version 1.9.0

heikog avatar Jul 27 '21 16:07 heikog

The issue is caused by some bug in Torch. You can find more information here. Unfortunately, there isn't a perfect solution now. The work-around is downgrading torch from 1.9 to 1.8 or 1.7.

dunbar12138 avatar Aug 12 '21 16:08 dunbar12138

Specifying the generator type fixes this on newer pytorch. Added a pull request, tested on pytorch==1.10.2 and saw that it started training without the error.

hirak99 avatar Mar 04 '22 11:03 hirak99

I tested it on pytorch == 1.10.2, but the issue still exists.

Traceback (most recent call last):
  File "run_nerf.py", line 1114, in <module>
    train()
  File "run_nerf.py", line 866, in train
    batch = next(raysRGB_iter).to(device)
  File "/Extra/yankai/software/miniconda3/envs/pytorch1.10/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/Extra/yankai/software/miniconda3/envs/pytorch1.10/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/Extra/yankai/software/miniconda3/envs/pytorch1.10/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/Extra/yankai/software/miniconda3/envs/pytorch1.10/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 229, in __iter__
    for idx in self.sampler:
  File "/Extra/yankai/software/miniconda3/envs/pytorch1.10/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 126, in __iter__
    yield from torch.randperm(n, generator=generator).tolist()
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

changer-zxy avatar Mar 25 '22 03:03 changer-zxy

That's because my patch hasn't been pulled yet. Please try with my fork.

hirak99 avatar Mar 25 '22 07:03 hirak99

That's because my patch hasn't been pulled yet. Please try with my fork.

@hirak99 Thanks for you! The problem has been solved by use your fork.

changer-zxy avatar Mar 25 '22 09:03 changer-zxy

That's because my patch hasn't been pulled yet. Please try with my fork.

@hirak99 Thanks! It works for me!

wujun-cse avatar Apr 05 '22 03:04 wujun-cse

@hirak99 Absolute legend, thanks so much!

ernlavr avatar Apr 23 '22 12:04 ernlavr

Hi,did you encounterd this problem when it comes to 4999 iters? Traceback (most recent call last): File "run_nerf.py", line 1134, in train() File "run_nerf.py", line 1075, in train rgbs, disps = render_path(torch.Tensor(poses[i_test]).to(device), hwf, args.chunk, render_kwargs_test, gt_imgs=images[i_test], savedir=testsavedir) TypeError: expected TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)) (got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

Seagullflysxt avatar Sep 04 '22 14:09 Seagullflysxt

Adding generator=torch. generator (device='cuda') to the random_split() and dataloader() functions in train.py file will solve this problem.

Pytorch1.10.0 + cu113 :)

RyanSKJ avatar Sep 15 '22 05:09 RyanSKJ

Hi, @Seagullflysxt. Have you fixed this error? I also found this issue.

MaxChanger avatar Sep 22 '22 06:09 MaxChanger

Thanks for your work! I solved the issue by changing the item torch.Tensor(poses[i_test]).to(device) to poses[i_test]

yyzzyy78 avatar Oct 02 '22 15:10 yyzzyy78

I solve this problem by change test_loss = img2mse(torch.Tensor(rgbs), images[i_test]) to test_loss = img2mse(torch.Tensor(rgbs).to(device), images[i_test])

Marquess98 avatar Jul 01 '23 09:07 Marquess98