DROID-SLAM
DROID-SLAM copied to clipboard
Error when using default weights "droid.pth" as pretrained weights
Hi @zachteed @xhangHU I couldn't use your weights "droid.pth" for training? I faced this error:
Traceback (most recent call last):
File "train.py", line 189, in <module>
mp.spawn(train, nprocs=args.gpus, args=(args,))
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/trainer/droidslam/train.py", line 60, in train
model.load_state_dict(torch.load(args.ckpt))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
I am trying to train the model on KITTI These are the parameters which I am using :
clip=2.5, edges=24, fmax=96.0, fmin=8.0, gpus=4, iters=15, lr=5e-05, n_frames=7, noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=4
I figured it out and made some changes in class UpdateModule(nn.Module)
self.weight = nn.Sequential(
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 3, 3, padding=1),
GradientClip(),
nn.Sigmoid())
also
self.delta = nn.Sequential(
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 3, 3, padding=1),
GradientClip())
if you have any advices about training the model on KITTI or training config, it will be appreciated !
Why do we need to change the model shape for training vs inference?