manhattan_sdf icon indicating copy to clipboard operation
manhattan_sdf copied to clipboard

Facing issue while training.

Open richas46 opened this issue 1 year ago • 3 comments

Hi, thankyou for sharing your wonderful work.

I've followed the steps given in the setup. When I proceed further to the training step and run python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050, I'm facing a RuntimeError: Function 'MmBackward' returned nan values in its 1th output.

How should I solve this error?

richas46 avatar Jul 09 '22 00:07 richas46

This is what the detailed error looks like -

Traceback (most recent call last): File "train_net.py", line 127, in main() File "train_net.py", line 123, in main train(cfg, network) File "train_net.py", line 57, in train trainer.train(epoch, train_loader, optimizer, recorder) File "C:\Users\Local_Admin\Desktop\manhattan_sdf-main\lib\train\trainers\trainer.py", line 86, in train output, loss, loss_stats, image_stats = self.network(batch) File "C:\Users\Local_Admin\anaconda3\envs\manhattan\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl result = self.forward(*input, **kwargs) File "lib/train/trainers/manhattan_sdf.py", line 14, in forward output = self.net(batch) File "C:\Users\Local_Admin\anaconda3\envs\manhattan\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl result = self.forward(*input, **kwargs) File "lib/networks/network.py", line 209, in forward perturb=pertube File "lib/networks/network.py", line 166, in volume_render ret_i = render_rayschunk(rays_o[:, i:i+rayschunk], rays_d[:, i:i+rayschunk]) File "lib/networks/network.py", line 114, in render_rayschunk radiances, semantics, sdf, nablas = batchify_query(model.forward, pts, view_dirs.unsqueeze(-2).expand_as(pts)) File "C:\Users\Local_Admin\Desktop\manhattan_sdf-main\lib\utils\net_utils.py", line 153, in batchify_query raw_ret_i = query_fn(*args_i) File "lib/networks/network.py", line 44, in forward sdf, nablas, geometry_feature = self.forward_surface_with_nablas(x) File "lib/networks/network.py", line 40, in forward_surface_with_nablas sdf, nablas, h = self.sdf_net.forward_with_nablas(x) File "C:\Users\Local_Admin\Desktop\manhattan_sdf-main\lib\networks\base.py", line 192, in forward_with_nablas only_inputs=True File "C:\Users\Local_Admin\anaconda3\envs\manhattan\lib\site-packages\torch\autograd_init.py", line 192, in grad inputs, allow_unused) RuntimeError: Function 'MmBackward' returned nan values in its 1th output.

richas46 avatar Jul 09 '22 00:07 richas46

Hi, Thanks for your interest! I'm not sure what the reason is. Do you use the conda envrionment file we provide? And is this bug reproducible?

ghy0324 avatar Jul 16 '22 06:07 ghy0324

Yes I'm using the conda environment provided.

richas46 avatar Aug 02 '22 10:08 richas46