dimr icon indicating copy to clipboard operation
dimr copied to clipboard

Something Error when I run the train.py

Open hmax233 opened this issue 3 years ago • 2 comments

Thanks for your work! But when I try to run the train.py, there is an error"RuntimeError: running_mean should contain 96 elements not 192". It seems to be caused by the layers channel mismatching. I want to know do you test the script? Or just me has the problem.

hmax233 avatar Aug 08 '22 07:08 hmax233

can you provide the full error log and script to train? The output should be like:

[2022-08-08 22:12:29,924  INFO  log.py  line 40  24979]  ************************ Start Logging ************************
[2022-08-08 22:12:29,957  INFO  train.py  line 23  24979]  Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=0.09, batch_size=8, bg_thresh=0.25, block_reps=2, block_residual=True, classes=25, cluster_meanActive=50, cluster_npoint_thre=50, cluster_radius=0.03, cluster_shift_meanActive=300, config='config/rfs_phase1_scannet.yaml', data_root='../dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', epochs=256, eval=False, eval_voxel_size=0.047, exp_path='exp/scannetv2/rfs/rfs_phase1_scannet', fg_thresh=0.75, filename_suffix='_inst_nostuff.pth', fix_module=[], full_scale=[128, 512], ignore_label=-100, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, m=16, manual_seed=123, max_npoint=250000, mesh_iou_thresh=[0.25, 0.5], mode=4, model_dir='model/rfs.py', model_name='rfs', momentum=0.9, multiplier=0.1, optim='Adam', prepare_epochs=256, prepare_epochs_2=256, pretrain='', pretrain_module=[], pretrain_path=None, sample=False, save_freq=16, save_instance=False, save_mesh=False, save_pt_angles=True, save_pt_offsets=False, save_semantic=True, scale=50, score_fullscale=14, score_mode=4, score_scale=50, split='val', step_epoch=256, task='train', test_epoch=256, test_seed=567, test_workers=4, train_workers=4, use_coords=True, use_rgb=False, weight_decay=0.0001)
[2022-08-08 22:12:29,963  INFO  train.py  line 172  24979]  => creating model ...
Loaded pretrained BSP-Net: #params = 37321472
[2022-08-08 22:12:44,925  INFO  train.py  line 180  24979]  cuda available: True
[2022-08-08 22:12:45,023  INFO  train.py  line 185  24979]  #classifier parameters: 53511475
[2022-08-08 22:12:45,043  INFO  scannetv2_inst.py  line 55  24979]  Training samples: 1181
[2022-08-08 22:12:45,043  INFO  scannetv2_inst.py  line 65  24979]  Validation samples: 311
1 / 1 | L: 7.8248 lr:0.001000 | sem: 3.2595 off: 1.9187/-0.0098 ang:2.4948/0.1615 | T: 144:38:34
1 / 2 | L: 7.7170 lr:0.001000 | sem: 3.2242 off: 1.8480/-0.0130 ang:2.4915/0.1706 | T: 89:49:45
......

ashawkey avatar Aug 08 '22 14:08 ashawkey

Sorry to have not sent you a message back until now, the error message is below. It is because the number of channels in the network does not match. I found the reason for the error was this line of code, which is normal when not multiplying the number of nPlanes[0] by 2

mode/rfs.py line 130 for i in range(block_reps): if i == 0: blocks_tail['block{}'.format(i)] = block(nPlanes[0] * 2, nPlanes[0], norm_fn, indice_key='subm{}'.format(indice_key_id)) else: blocks_tail['block{}'.format(i)] = block(nPlanes[0], nPlanes[0], norm_fn, indice_key='subm{}'.format(indice_key_id))

` File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec exec(code, globals, locals)

File "/media/hmax/Elements/Github_respo/dimr-main/train.py", line 212, in train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)

File "/media/hmax/Elements/Github_respo/dimr-main/train.py", line 51, in train_epoch loss, _, visual_dict, meter_dict = model_fn(batch, model, epoch)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 861, in model_fn ret = model(model_inp)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 418, in forward output = self.unet(output)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 143, in forward output_decoder = self.u(output_decoder)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 143, in forward output_decoder = self.u(output_decoder)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 143, in forward output_decoder = self.u(output_decoder)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 143, in forward output_decoder = self.u(output_decoder)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 143, in forward output_decoder = self.u(output_decoder)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 148, in forward output = self.blocks_tail(output)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward input = module(input)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/media/hmax/Elements/Github_respo/dimr-main/model/rfs.py", line 80, in forward output = self.conv_branch(input)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 141, in forward input = input.replace_feature(module(input.features))

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs)

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps,

File "/home/hmax/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/functional.py", line 2422, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled

RuntimeError: running_mean should contain 96 elements not 192`

hmax233 avatar Sep 08 '22 09:09 hmax233