AdaBins
AdaBins copied to clipboard
Cant reproduce evaluation metrics on NYU
Dear authors,
I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.
I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your compute_errors
script.
I'm obtaining the following results:
Retrained - BEST
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.782 | 0.961 | 0.991 | 0.146 | 0.666 | 0.065 | 0.204 | 19.751 | 0.111 |
Retrained - LAST
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.791 | 0.950 | 0.986 | 0.148 | 0.655 | 0.064 | 0.210 | 20.915 | 0.118 |
Trained by AUTHORS
a1 | a2 | a3 | abs_rel | rmse | log_10 | rmse_log | silog | sq_rel |
---|---|---|---|---|---|---|---|---|
0.861 | 0.973 | 0.994 | 0.118 | 0.562 | 0.053 | 0.174 | 16.925 | 0.082 |
This is far away from the results shown on paper. Can you please help me reproduce your paper results?
Dear authors,
I'm trying to reproduce your results on NYU Depth V2 dataset, but I'm facing some problems regarding the evaluation results, both retraining the network from scratch and using your pretrained weights.
I'm using your 'eval on a single PIL image' script over all 654 images from NYU test split, and evaluating using your
compute_errors
script.I'm obtaining the following results:
Retrained - BEST
|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|
|-|-|-|-|-|-|-|-|-|
|0.782|0.961|0.991|0.146|0.666|0.065|0.204|19.751| 0.111|
Retrained - LAST
|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|
|-|-|-|-|-|-|-|-|-|
|0.791|0.950|0.986|0.148|0.655|0.064|0.210|20.915|0.118|
Trained by AUTHORS
|a1|a2|a3|abs_rel|rmse|log_10|rmse_log|silog|sq_rel|
|-|-|-|-|-|-|-|-|-|
|0.861|0.973|0.994|0.118|0.562|0.053|0.174|16.925|0.082|
This is far away from the results shown on paper. Can you please help me reproduce your paper results?
TLDR; try using python evaluate.py args_nyu_test.txt
You might not be using the proper cropping and may also be using invalid GT values. The results in the paper are according to Eigen Crop and the invalid GT regions (where depth is zero, negative or out of range) need to be masked out.
Try using python evaluate.py args_nyu_test.txt
to directly evaluate the pretrained models or use the eval
function defined inside evaluate.py
.
@luisdecker Were you able to retrain the model to achieve the accuracy as mentioned in the paper?
Thank you, SK
@shreyaskamathkm As far as I remember, I was able to reproduce the results (or at least a sufficient close one) with the code provided by the authors
@luisdecker Interesting! I am trying to re-train the network, but I am unable to achieve the result posted by the author. On the contrary, the provided weights reflect the results from the paper. So I am not sure if the training code is correct.
@luisdecker
Did you try to reproduce NYU Depth V2 dataset with 120k image training set or the 50k subset? I am trying the 50k subset from DenseDepth here, however, the results is not normal at all.
P.S. from which source you downloaded the NYU v2 dataset?
@leoshine
Different from ours, DenseDepth uses inpainted depth maps. Please refer to this for details.