Errors increased dramatically and program failed
Hi there,
I managed to run the project. It performs well at the beginning, but the error increases dramatically at some point and the program fails due to torch._C._LinAlgError: linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
The exception trace is
Traceback (most recent call last):
File "scripts_pose_tracking/pose_tracking.py", line 80, in <module>
main()
File "/home/bo/miniconda3/envs/pytorch/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/bo/miniconda3/envs/pytorch/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/bo/miniconda3/envs/pytorch/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bo/miniconda3/envs/pytorch/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "scripts_pose_tracking/pose_tracking.py", line 57, in main
est, gt, _ = tracker.register_next()
File "/home/bo/projects/LocNDF/src/loc_ndf/utils/registration.py", line 88, in register_next
current_pose = self.register_scan(
File "/home/bo/projects/LocNDF/src/loc_ndf/utils/registration.py", line 150, in register_scan
DT = self.registration_step(points_t, GM_k=self.GM_k).detach()
File "/home/bo/projects/LocNDF/src/loc_ndf/utils/registration.py", line 169, in registration_step
T = df_icp(points[..., :3], gradients, distances, GM_k=GM_k)
File "/home/bo/projects/LocNDF/src/loc_ndf/utils/registration.py", line 196, in df_icp
t = torch.linalg.inv(N.to(dtype=torch.float64)
torch._C._LinAlgError: linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
I am using the pretrained checkpoint best-v11.ckpt on the test set of ColumbiaPark. Other checkpoints have the same issue. More specially, the first 9 checkpoints fail at the beginning. Checkpoints 11 and 12 fail at ~20%. Checkpoints 10 and 13 fail below 5% and the errors are several meters.
Here is the output of the checkpoint best-v11.ckpt before the program fails.
avg icp time: 0.3935760729749438
dt: 0.251m, dr: 0.243deg
20%|███████████████████████████████████████████████▋ | 142/700 [00:56<05:41, 1.63it/s]avg icp time: 0.3955643527157657
dt: 0.423m, dr: 0.589deg
20%|████████████████████████████████████████████████ | 143/700 [00:57<05:52, 1.58it/s]avg icp time: 0.39743862052758533
dt: 0.669m, dr: 0.786deg
21%|████████████████████████████████████████████████▎ | 144/700 [00:57<05:57, 1.55it/s]avg icp time: 0.39928395337071915
dt: 0.349m, dr: 0.709deg
21%|████████████████████████████████████████████████▋ | 145/700 [00:58<06:01, 1.54it/s]avg icp time: 0.40121664739634894
dt: 0.501m, dr: 0.538deg
21%|█████████████████████████████████████████████████ | 146/700 [00:59<06:06, 1.51it/s]avg icp time: 0.40289818835096297
dt: 0.163m, dr: 0.319deg
21%|█████████████████████████████████████████████████▎ | 147/700 [00:59<06:03, 1.52it/s]avg icp time: 0.4041551560968966
dt: 0.83m, dr: 0.847deg
21%|█████████████████████████████████████████████████▋ | 148/700 [01:00<05:52, 1.57it/s]avg icp time: 0.40543151381831843
dt: 0.712m, dr: 1.52deg
21%|██████████████████████████████████████████████████ | 149/700 [01:00<05:44, 1.60it/s]avg icp time: 0.4065725088119507
dt: 0.62m, dr: 1.86deg
21%|██████████████████████████████████████████████████▎ | 150/700 [01:01<05:36, 1.63it/s]avg icp time: 0.40670135005420405
dt: 0.65m, dr: 2.31deg
22%|██████████████████████████████████████████████████▋ | 151/700 [01:01<05:05, 1.79it/s]avg icp time: 0.4078362380203448
dt: 1.2m, dr: 1.29deg
22%|███████████████████████████████████████████████████ | 152/700 [01:02<05:09, 1.77it/s]avg icp time: 0.40898071083368037
dt: 1.96m, dr: 1.73deg
22%|███████████████████████████████████████████████████▎ | 153/700 [01:03<05:12, 1.75it/s]avg icp time: 0.41008173026047745
dt: 4.92m, dr: 4.48deg
22%|███████████████████████████████████████████████████▋ | 154/700 [01:03<05:13, 1.74it/s]avg icp time: 0.41110943825014173
dt: 10.3m, dr: 10.2deg
22%|████████████████████████████████████████████████████ | 155/700 [01:04<05:12, 1.74it/s]avg icp time: 0.4121675170384921
dt: 22.9m, dr: 23.4deg
22%|████████████████████████████████████████████████████▎ | 156/700 [01:04<05:12, 1.74it/s]avg icp time: 0.41316076145050634
dt: 34.4m, dr: 27.7deg
22%|████████████████████████████████████████████████████▋ | 157/700 [01:05<05:11, 1.74it/s]avg icp time: 0.4141745280615891
dt: 65.7m, dr: 59.4deg
23%|█████████████████████████████████████████████████████ | 158/700 [01:05<05:11, 1.74it/s]avg icp time: 0.4151591444915196
dt: 93.7m, dr: 87.2deg
23%|█████████████████████████████████████████████████████▍ | 159/700 [01:06<05:10, 1.74it/s]avg icp time: 0.4161439910531044
dt: 1.17e+02m, dr: 1.15e+02deg
23%|█████████████████████████████████████████████████████▋ | 160/700 [01:07<03:47, 2.37it/s]
Traceback (most recent call last):
File "scripts_pose_tracking/pose_tracking.py", line 80, in <module>
Do you have any idea of this problem? Thank you very much!
Best, Bo Yang