PSMNet
PSMNet copied to clipboard
training loss = nan on custom dataset
A slice training process is like the following:
Iter 496 training loss = 65.174 , time = 2.39 epoch 41 total training loss = nan Iter 497 training loss = 124.104 , time = 2.38 epoch 41 total training loss = nan Iter 498 training loss = 143.243 , time = 2.34 epoch 41 total training loss = nan Iter 499 training loss = 102.472 , time = 2.36 epoch 41 total training loss = nan Iter 500 training loss = 54.147 , time = 2.34 epoch 41 total training loss = nan Iter 501 training loss = 76.837 , time = 2.38 epoch 41 total training loss = nan Iter 502 training loss = 67.174 , time = 2.36 epoch 41 total training loss = nan Iter 503 training loss = 58.369 , time = 2.31 epoch 41 total training loss = nan Iter 504 training loss = 76.735 , time = 2.37 epoch 41 total training loss = nan Iter 505 training loss = 150.376 , time = 2.33 epoch 41 total training loss = nan Iter 506 training loss = 76.206 , time = 2.27 epoch 41 total training loss = nan Iter 0 3-px error in val = 94.037
Why? Could you give some advise?
@passion3394 Could you give more information about this training? Such as dataset, learning rate, ... etc.
@JiaRenChang hi, I use the Apollo depth dataset to train the model, the learning rate is calculated by the following formular 'lr = 0.01 * 0.1 ** (epoch // 30)'. Recently, I found the apollo depth dataset didn't do the work of Epipolar Geometry between the left image and right image, I think it's the main reason for the appearing of nan. Is that right?
@passion3394 Yes, I thought about it too. It seems that there are only background depth maps in the Apollo dataset.
@JiaRenChang Yes, the depth of movable objects have been eliminated. Dear author, I have three questions about the PSMNet, hope to get the answers from you: (1)If we use the depth map of only background to train, and try to apply the trained model to the images with movable objects, the precision would be terrible? (2)If the left image and the right image have different contrast ratio, the testing result would be worse than the two images with the same contrast ratio? (3)After doing the work of Epipolar Geometry, my left image and right image have difference of two pixels on the height direction, will that be a very bad factor of testing?
@passion3394 (1) I think that the precision will be pretty bad because the movable objects usually have large disparities. But the background depth maps usually have small disparities. The strong imbalance may cause the problem of generalization.
(2) and (3) We actually tried testing PSMNet on the "real world (required by web cameras, weak camera calibration, and outdoor)" image pairs. We still can achieve pretty good results.
@JiaRenChang thanks for your reply. I have communicated with the workers of apollo dataset. They will release a disparity dataset similar with kitti on the apollo dataset, which contains more image pairs and disparity images and the disparity is much denser.
@passion3394 @JiaRenChang hello, I have also used apollo depth dataset to train, but I will get an error "IndexError: too many indices for tensor of dimension 3" can you help me
thank you very much!
@hnsywangxin sorry, may be more error info?
@passion3394 thank you for your replay,my error like this:
Traceback (most recent call last):
File "finetune.py", line 266, in <module>
main()
File "finetune.py", line 231, in main
loss = train(imgL_crop,imgR_crop, disp_crop_L)
File "finetune.py", line 169, in train
loss = 0.5*F.smooth_l1_loss(output1[mask], disp_true[mask], size_average=True) + 0.7*F.smooth_l1_loss(output2[mask], disp_true[mask], size_average=True) + F.smooth_l1_loss(output3[mask], disp_true[mask], size_average=True)
IndexError: too many indices for tensor of dimension 3
my input is depth map of apollo dataset,my batch_size =4,other parameter was the same as original program
@passion3394 thank you for your replay,my error like this:
Traceback (most recent call last):File "finetune.py", line 266, in <module>main()File "finetune.py", line 231, in mainloss = train(imgL_crop,imgR_crop, disp_crop_L)File "finetune.py", line 169, in trainloss = 0.5*F.smooth_l1_loss(output1[mask], disp_true[mask], size_average=True) + 0.7*F.smooth_l1_loss(output2[mask], disp_true[mask], size_average=True) + F.smooth_l1_loss(output3[mask], disp_true[mask], size_average=True)IndexError: too many indices for tensor of dimension 3my input is depth map of apollo dataset,my batch_size =4,other parameter was the same as original program
hi, hnsywangxin, have you find the solution? I have the same error