PSMNet
PSMNet copied to clipboard
Strange result with pretrained module on SceneFlow
Firstly, congrats for the nice work and the open source repo provided. A really strange phenomenon occurs when I use the stackhourglass model pretrained on SceneFlow. Although, it produces really accurate disparity maps with excellent localization and shaping of objects, the absolute values of the disparity map are a little bit out of range. In most cases, if, for example, I multiply the disparity map elementwise with a constant value (normally at about 1.15) then the predictions match the ground truth and the 3px and mean average error (mae) approach the values reported in the paper. But without this "post-processing" step the mae is at about 10px. Has anybody faced the same problem or has any idea why that happens?
ps1. The same doesn't happen when I use the weights pre-trained on KITTI_2015 and evaluate on KITTI_2015.
ps2. It would be really helpful for comparing purposes to also provide pre-trained weights for the basic network
ps. I post an example to visualize what I mean
@givasile Oh...it is really strange. We did not meet this problem before. You may check the upsampling function, the difference between align_corners= True or False.
Yes, really strange. I tested align_corners=True/False but the effect remains in any case. I also changed True/False on KITTI pre-trained network, where there is no such problem, and checked that True produces much more accurate results. So I conclude that align_corners=True is the correct option.
Logically the error must be located in the softmax module, but I cannot localize exactly where. In any case, if a find the source of the problem I will report it.
@givasile Thank you!
@givasile @JiaRenChang I can confirm the same issue with the pre-trained sceneflow dataset. I'm seeing a scaling between 1.12 to 1.17.
If I retrain on the sceneflow data, am I incline to repeat the same results? Or do you think there was some instability with this model specifically?
Again, thanks on a great piece of software!
Thanks,
@jtressle Unfortunately my GPU's memory is not big enough to fit the model at training phase (only at inference) so I don't know if the documented results can be reached through retraining from scratch or finetuning. If you can try it, it would be really helpful to report as the results.
@givasile I'll see about spinning up a server to test the training. I did more tests and the 1.15 average adjustment seems consistent. I'm hoping the offset is something like the square root of 255.0 / 192.0 (~=1.15) which may mean a hard coded constant since changed.
Will report back if I get results!
@jtressle Maybe. If you assure it experimentally, report back the results it would be really helpful to know what is happening.
@JiaRenChang @givasile I got the same problem here. Using the pre-trained model on SceneFlow, the final testing loss is around 6. If the output multiplied by 1.15, then the testing loss becomes lower (even more lower if the output multiplied by 1.17). But there is no such problem with the model retrained from scratch.
@DengYon So, by retraining the model you reach the results reported in the paper?
@givasile I got total test loss = 1.119, which is the same as last row of Table 2. However, I changed the network a little bit. The dataset I used is of webp format.
@DengYon Same problem here, even if I test the model trained by myself later (not directly after the training process), I will have this kind of problem.
@zhFuECL You mean if you test the model later (by loading model), you will get a testing loss which is different from the one you get when you finish the training? I haven't try this before. How did you do this?
@DengYon set --epochs 0 and --loadmodel /path/of/your/model.
@zhFuECL I have re-tested the model I trained. I get the same result as the one I got after the training. Did You get a different testing lost? It shouldn't be if you re-tested the same model in a same environment.
@DengYon For me, I got a different result (worse result) compared with the result I got from training.
It only happens on SceneFlow dataset. On KITTI2015, it's ok.
Could you show me your python environment?
if you use conda, run conda env export > environment.yaml
else pip freeze > environment.txt
@zhFuECL The list is rather long, I only copy the useful part. (Python 2.7.12) numpy==1.15.0 Pillow==5.2.0 torch==0.4.1 torchvision==0.2.0 #Cuda compilation tools, release 9.1, V9.1.85 Did you change your code after the training? Or did you use a computer with a different setting?
@zhFuECL The list is rather long, I only copy the useful part. (Python 2.7.12) ...
@DengYon my pytorch version is 0.4.0. I only changed the mentioned upsample function. I will check again later. By the way, how many GPU you used to train? For me, I use two GPUs. I can get the reported result on Sceneflow. But fine tune on Kitti2015, it’s much worse. I want to figure out if the smaller batch size causes it.
Sent with GitHawk
@zhFuECL I use 4 GPU. Did you figure out why the result of Sceneflow gets poor in retest?
@zhFuECL I use 4 GPU. Did you figure out why the result of Sceneflow gets poor in retest?
@DengYon Not yet. So how about your fine-tune result on kitti2015?
Sent with GitHawk
@zhFuECL As given by the program, "epoch 300 total 3-px error in val = 1.831". I haven't used the official MATLAB code from KITTI to exam the result yet.
@zhFuECL As given by the program, "epoch 300 total 3-px error in val = 1.831". I haven't used the official MATLAB code from KITTI to exam the result yet.
@DengYon You can send me the model if you want. I can test for you.
Sent with GitHawk
@zhFuECL The list is rather long, I only copy the useful part. (Python 2.7.12) numpy==1.15.0 Pillow==5.2.0 torch==0.4.1 torchvision==0.2.0 #Cuda compilation tools, release 9.1, V9.1.85 Did you change your code after the training? Or did you use a computer with a different setting?
@givasile @DengYon I think maybe it's a problem caused by numpy, after I update the numpy from 1.14 to 1.15.0, everything works fine. I got an error rate 1.078% on SceneFlow with 2 gpu (bs =6). But I still can't get the reported accuracy on KITTI2015 (2.774% by test function,2.34% by kitti devkit).
@givasile Hello, have you solved this problem or found the reason? I also met this. My GPU is one Titan XP and I set the test batchsize to be 2.
@zhFuECL Sorry for the late reply. My network is changed a little bit. So, my model cannot be loaded into PSMNet directly. But I tend to believe that my modification is unimportant since the result is quit close to the reported result. I checked the code log and I found that apart from changing upsample function (add "align_corners=True"), I also fixed an error in "listflowfile.py"(change one of the '35mm_focallength' to '15mm_focallength'). If you didn't fix this error, it could led to worse result on KITTI because part of the driving dataset was unused.
@zhFuECL Sorry for the late reply. My network is changed a little bit. So, my model cannot be loaded into PSMNet directly. But I tend to believe that my modification is unimportant since the result is quit close to the reported result. I checked the code log and I found that apart from changing upsample function (add "align_corners=True"), I also fixed an error in "listflowfile.py"(change one of the '35mm_focallength' to '15mm_focallength'). If you didn't fix this error, it could led to worse result on KITTI because part of the driving dataset was unused.
@JiaRenChang I wonder it's a mistake or you only want to use one of them? @DengYon By the way, I can get the reported Kitti2015 result finetuning from the pretrained SceneFlow model (2gpu, bs=3).
That was a mistake. I will correct it soon.
That was a mistake. I will correct it soon.
Thx! By the way, I found that it took the same time when I use one and two gpus. Do you have some strategies to speed up?
Is this problem solved now?
I did not think any one solved this problem, cuz 1.17 or 1.25 makes no sense. The released code and pretrained model on SceneFlow should have the results reported in the paper. I downloaded the code and pretrained model of SceneFlow, used the same packages, (python 2.7 + pytorch 0.4.0 + torchvision 0.2.0), but the final SceneFlow test EPE (loss) is 6.273. If I trained from scratch, it is 1.3244, not the paper's 1.09 or 1.12.
Did any one figure out why? Even if I multiplied 1.17 using the official SceneFlow pretrained model, it is still around 1.37.
did you solve this? as I saw you said you got around 6 EPE, which is the same as mine. @Deng-Y
@JiaRenChang @givasile I got the same problem here. Using the pre-trained model on SceneFlow, the final testing loss is around 6. If the output multiplied by 1.15, then the testing loss becomes lower (even more lower if the output multiplied by 1.17). But there is no such problem with the model retrained from scratch.
@JiaRenChang Is there any way to obtain the reported EPE on KITTI 2015? It seems that even after the x1.17 multiplication the EPE remains high...