DORN
DORN copied to clipboard
evaluation Eigen split
Hi,
I was wondering if you could share your evaluation code or tell me which code did you use for evaluation?official kitti evaluation? Did you use lidar raw data or post processed groundtruth provided by kitti?
@a-jahani I've been using Monodepth's evaluation file. However, I'm not sure if I'm using the correct predicted images since their model predict disparities instead of depths.
https://github.com/mrharicot/monodepth/blob/master/utils/evaluate_kitti.py
Hi, @nicolasrosa . I have used the evaluation code from this. https://github.com/danxuhk/StructuredAttentionDepthEstimation/blob/master/StructuredAttentionDepthEstimation/utils/evaluation_depth.py However, the result of mine is little different from paper.
@Sunyx93 Are you predicting Depth (meters, log(meters)) or Disparities? Which dataset are you evaluating? KITTI Stereo 2015 (200 testing images) or KITTI RawData using the Eigen's split (697 testing images)?
@nicolasrosa I predicting Depth in meters under the Eigen's split. Following this link to generate Depth map https://github.com/danxuhk/StructuredAttentionDepthEstimation/blob/master/StructuredAttentionDepthEstimation/utils/evaluation_utils.py
Since you're predicting Depth in meters like me, I recommend you review the evaluation code checking if you're not using disparity information as ground truth. For instance, I had to change the convert_disps_to_depths_kitti()
function to the following one:
def convert_gt_disps_to_depths_kitti(gt_disparities):
gt_depths = []
for i in range(len(gt_disparities)):
gt_disp = gt_disparities[i]
height, width = gt_disp.shape
mask = gt_disp > 0
gt_depth = width_to_focal[width] * 0.54 / (gt_disp + (1.0 - mask))
# Workaround by Nick
mask = np.logical_and(gt_disp > 0.0, gt_depth)
gt_depth = gt_depth * mask
gt_depths.append(gt_depth)
return gt_depths
PS. There is a bug in the original code monodepth which is still left there for paper repeatability reasons. Here is the link to the discussion monodepth inaccurate evaluation
@nicolasrosa I haven't reevaluate DORN results yet, but I think they are probably doing the conversion right. Since they sent their result to KITTI official depth evaluation website and they got the first rank it's unlikely they did that mistake.
@a-jahani Thanks for the information. We should had this conversation some weeks ago. I just submitted an article last week haha. It's too late now. I agree with you guys, evaluating depth estimatiom task it's so confusing. Since everyone have been using Eigen's Split for so long, it's hard to compare ours methods to others state-of-the-art results.
@a-jahani Thanks for your mention in https://github.com/mrharicot/monodepth/issues/166#issuecomment-404993498. I have evaluate DORN with the official ground truth and I got 3.08 in RMSE. But some picture do not overlap between eigen split and official ground truth. So I just calculate mean value of 652 pictures.
@Sunyx93 Can you provide the whole evaluation metrics (Abs rel, RMSE, and others) using monodepth's code? It would provide us a fair comparison with other methods. Thank you!
@a-jahani @Sunyx93 Hi, very thank you for providing your evaluation result. Could you please give some details about your evaluation or provide your evaluation code?
Actually, I also tried to do evaluation using the DORN pretrained model named 'cvpr_kitti.caffemodel'. I used the depth prediction code from DORN/demo_kitti.py to generate the depth prediction, and my evaluation is the same as monodepth's evaluation with garg crop. I have tried both 697-raw KITTI gt and 652-official gt, the cap-80m RMSE I got are 4.2643 and 3.5179 respectively.
I wonder if the image format has a big impact here, the data I used is jpg RGB image data, which is also used in monodepth, but the original KITTI image format is png... I think this is the only difference.
@joseph-zhang, To be honest, I'm not sure how the data format will affect it (it is possible). For all my experiments I never switched from png to jpg just not to incorporate anything or any doubts like this. I used png files and I used 652 eigen files that official kitti depth is available for them.
Below is my result if anyone is interested:
If anyone is interested in reproducing my result you can download my npy file for 697 files of the eigen. The results are saved in meters (monodepth is in disparity!). You can check this comment to reproduce my result for DORN: #https://github.com/a-jahani/semiDepth/issues/1#issuecomment-514438234
Very thank you for providing the evaluation details! I think I'd better use original png KITTI images.
@a-jahani Thank you once again for providing the npy dorn result!
I have tried your npy result with my dataloader on 652 official KITTI data (this time the data is png format), and I got the same evaluation values as what you have reported. Actually, whether the data format is png or not is not a big problem, it only has a small impact. Your npy file helps me verify if my dataloader is written correctly, and it is.
The reason why I got a bad result is a bug in my evaluation code, I didn't copy the DORN inference code correctly. After revising the evaluation code, I got RMSE 2.91, which is very close to yours. There is a small difference between them, I think the reason is that we use different lmage libraries, PIL is used in my dataloader (I have to add a conversion for matching caffe data input format), but opencv can also be used. You know, @Sunyx93 get RMSE 3.08... also a different result.
By the way, there is still something makes me confused. In this issue a-jahani/semiDepth#1, you mentioned that you got a even lower RMSE than 2.88 on lidar raw data. However, your npy files and my code both produce RMSE around 3.7 in this circumstance. I'm not sure if I got a correct result on raw data...
You are right. It was a long time ago. I just rerun the result for raw lidar using same npy I provided:
python2 utils/evaluate_kitti.py --split eigen --garg_crop --depth_provided --predicted_disp_path ../DORN/eigen697_depth.npy --gt_path /home/datasets/KITTI/KITTI_Raw/
Here is my result of eigen for 697 images using monodepth eval with a alittle bit of modification:
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.1106, 0.6179, 3.658, 0.168, 0.000, 0.894, 0.964, 0.984
It makes sense the difference of the 0.778 comes from raw lidar error itself. When evaluating by LiDAR, error usually is higher due to LiDAR errors(occlusion artifacts, refletion, moving objects, etc.). Although your network might produce good results, your error looks higher. That's why we cannot trust LiDAR evaluation anymore.
I quite agree with your opinion.
At least we should clarify what kind of data was used in training and evaluation phase.
@a-jahani : I downloaded your npy file, thank you very much for sharing. My results are slightly different than yours, see images below.
My current suspicion is that the checkpoint model has been changed, see https://github.com/hufu6371/DORN/issues/34
Can you confirm that you can reproduce "result/KITTI/demo_01_pred.png" exactly? If so, would you be able to upload your checkpoint version? If you have any other idea about where the difference would come from, I would appreciate it.
item 0 out of 697:
item 5 out of 697: