AdaBins icon indicating copy to clipboard operation
AdaBins copied to clipboard

Cant reproduce evaluation metrics on NYU

Open luisdecker opened this issue 3 years ago • 6 comments

Dear authors,

I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.

I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors script.

I'm obtaining the following results:

Retrained - BEST

a1 a2 a3 abs_rel rmse log_10 rmse_log silog sq_rel
0.782 0.961 0.991 0.146 0.666 0.065 0.204 19.751 0.111

Retrained - LAST

a1 a2 a3 abs_rel rmse log_10 rmse_log silog sq_rel
0.791 0.950 0.986 0.148 0.655 0.064 0.210 20.915 0.118

Trained by AUTHORS

a1 a2 a3 abs_rel rmse log_10 rmse_log silog sq_rel
0.861 0.973 0.994 0.118 0.562 0.053 0.174 16.925 0.082

This is far away from the results shown on paper. Can you please help me reproduce your paper results? 

luisdecker avatar Apr 16 '21 12:04 luisdecker

Dear authors,

I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.

I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors script.

I'm obtaining the following results:

Retrained - BEST

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.782|0.961|0.991|0.146|0.666|0.065|0.204|19.751| 0.111|

Retrained - LAST

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.791|0.950|0.986|0.148|0.655|0.064|0.210|20.915|0.118|

Trained by AUTHORS

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.861|0.973|0.994|0.118|0.562|0.053|0.174|16.925|0.082|

This is far away from the results shown on paper. Can you please help me reproduce your paper results? 

TLDR; try using python evaluate.py args_nyu_test.txt

You might not be using the proper cropping and may also be using invalid GT values. The results in the paper are according to Eigen Crop and the invalid GT regions (where depth is zero, negative or out of range) need to be masked out.

Try using python evaluate.py args_nyu_test.txt to directly evaluate the pretrained models or use the eval function defined inside evaluate.py.

shariqfarooq123 avatar Apr 16 '21 14:04 shariqfarooq123

@luisdecker Were you able to retrain the model to achieve the accuracy as mentioned in the paper?

Thank you, SK

shreyaskamathkm avatar Oct 20 '21 19:10 shreyaskamathkm

@shreyaskamathkm As far as I remember, I was able to reproduce the results (or at least a sufficient close one) with the code provided by the authors

luisdecker avatar Oct 29 '21 22:10 luisdecker

@luisdecker Interesting! I am trying to re-train the network, but I am unable to achieve the result posted by the author. On the contrary, the provided weights reflect the results from the paper. So I am not sure if the training code is correct.

shreyaskamathkm avatar Oct 29 '21 22:10 shreyaskamathkm

@luisdecker

Did you try to reproduce NYU Depth V2 dataset with 120k image training set or the 50k subset? I am trying the 50k subset from DenseDepth here, however, the results is not normal at all.

P.S. from which source you downloaded the NYU v2 dataset?

leoshine avatar Jan 07 '22 10:01 leoshine

@leoshine

Different from ours, DenseDepth uses inpainted depth maps. Please refer to this for details.

shariqfarooq123 avatar Jan 17 '22 23:01 shariqfarooq123