AdaBins Cant reproduce evaluation metrics on NYU

Dear authors,

I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.

I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors script.

I'm obtaining the following results:

Retrained - BEST

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.782	0.961	0.991	0.146	0.666	0.065	0.204	19.751	0.111

Retrained - LAST

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.791	0.950	0.986	0.148	0.655	0.064	0.210	20.915	0.118

Trained by AUTHORS

a1	a2	a3	abs_rel	rmse	log_10	rmse_log	silog	sq_rel
0.861	0.973	0.994	0.118	0.562	0.053	0.174	16.925	0.082

This is far away from the results shown on paper. Can you please help me reproduce your paper results?

Apr 16 '21 12:04 luisdecker

Dear authors,

I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.

I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors script.

I'm obtaining the following results:

Retrained - BEST

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.782|0.961|0.991|0.146|0.666|0.065|0.204|19.751| 0.111|

Retrained - LAST

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.791|0.950|0.986|0.148|0.655|0.064|0.210|20.915|0.118|

Trained by AUTHORS

|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|

|-|-|-|-|-|-|-|-|-|

|0.861|0.973|0.994|0.118|0.562|0.053|0.174|16.925|0.082|

This is far away from the results shown on paper. Can you please help me reproduce your paper results?

TLDR; try using python evaluate.py args_nyu_test.txt

You might not be using the proper cropping and may also be using invalid GT values. The results in the paper are according to Eigen Crop and the invalid GT regions (where depth is zero, negative or out of range) need to be masked out.

Try using python evaluate.py args_nyu_test.txt to directly evaluate the pretrained models or use the eval function defined inside evaluate.py.

Apr 16 '21 14:04 shariqfarooq123

@luisdecker Were you able to retrain the model to achieve the accuracy as mentioned in the paper?

Thank you, SK

Oct 20 '21 19:10 shreyaskamathkm

@shreyaskamathkm As far as I remember, I was able to reproduce the results (or at least a sufficient close one) with the code provided by the authors

Oct 29 '21 22:10 luisdecker

@luisdecker Interesting! I am trying to re-train the network, but I am unable to achieve the result posted by the author. On the contrary, the provided weights reflect the results from the paper. So I am not sure if the training code is correct.

Oct 29 '21 22:10 shreyaskamathkm

@luisdecker

Did you try to reproduce NYU Depth V2 dataset with 120k image training set or the 50k subset? I am trying the 50k subset from DenseDepth here, however, the results is not normal at all.

P.S. from which source you downloaded the NYU v2 dataset?

Jan 07 '22 10:01 leoshine

@leoshine

Different from ours, DenseDepth uses inpainted depth maps. Please refer to this for details.

Jan 17 '22 23:01 shariqfarooq123

AdaBins AdaBins copied to clipboard

Cant reproduce evaluation metrics on NYU

Retrained - BEST

Retrained - LAST

Trained by AUTHORS

Retrained - BEST

Retrained - LAST

Trained by AUTHORS

AdaBins
AdaBins copied to clipboard