calibrated-backprojection-network NYU performance

Hi Alex, Thanks for your great work. I'm reproducing the NYU v2 (generalization) experiments. I followed the instructions to prepare NYU data provided by you. When I used your pre-trained model kbnet-void1500.pth to evaluate on NYU v2，I got these errors: (kbnet) zlq@ivg-SYS-7048GR-TR:/home/disk2/code/calibrated-backprojection-network$ bash bash/void/run_knet_nyu_v2_test.sh usage: run_kbnet.py [-h] --image_path IMAGE_PATH --sparse_depth_path SPARSE_DEPTH_PATH --intrinsics_path INTRINSICS_PATH [--ground_truth_path GROUND_TRUTH_PATH] [--input_channels_image INPUT_CHANNELS_IMAGE] [--input_channels_depth INPUT_CHANNELS_DEPTH] [--normalized_image_range NORMALIZED_IMAGE_RANGE [NORMALIZED_IMAGE_RANGE ...]] [--outlier_removal_kernel_size OUTLIER_REMOVAL_KERNEL_SIZE] [--outlier_removal_threshold OUTLIER_REMOVAL_THRESHOLD] [--min_pool_sizes_sparse_to_dense_pool MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL [MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]] [--max_pool_sizes_sparse_to_dense_pool MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL [MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]] [--n_convolution_sparse_to_dense_pool N_CONVOLUTION_SPARSE_TO_DENSE_POOL] [--n_filter_sparse_to_dense_pool N_FILTER_SPARSE_TO_DENSE_POOL] [--n_filters_encoder_image N_FILTERS_ENCODER_IMAGE [N_FILTERS_ENCODER_IMAGE ...]] [--n_filters_encoder_depth N_FILTERS_ENCODER_DEPTH [N_FILTERS_ENCODER_DEPTH ...]] [--resolutions_backprojection RESOLUTIONS_BACKPROJECTION [RESOLUTIONS_BACKPROJECTION ...]] [--n_filters_decoder N_FILTERS_DECODER [N_FILTERS_DECODER ...]] [--deconv_type DECONV_TYPE] [--min_predict_depth MIN_PREDICT_DEPTH] [--max_predict_depth MAX_PREDICT_DEPTH] [--weight_initializer WEIGHT_INITIALIZER] [--activation_func ACTIVATION_FUNC] [--min_evaluate_depth MIN_EVALUATE_DEPTH] [--max_evaluate_depth MAX_EVALUATE_DEPTH] [--output_path OUTPUT_PATH] [--save_outputs] [--keep_input_filenames] [--depth_model_restore_path DEPTH_MODEL_RESTORE_PATH] [--device DEVICE] run_kbnet.py: error: unrecognized arguments: --avg_pool_sizes_sparse_to_dense_pool 0 --encoder_type knet_v1 fusion_conv_previous sparse_to_dense_pool_v1 --input_type sparse_depth validity_map 3 3 3 0 --n_resolutions_encoder_intrinsics 0 1 2 3 --skip_types image depth --decoder_type multi-scale --output_kernel_size 3 --outlier_removal_method remove

So I deleted the unrecognized arguments and run it again, this time I got these numbers:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
 122.836   228.426    24.147    50.003
     +/-       +/-       +/-       +/-
  71.550   130.133    16.920    36.531

I know the numbers are close to the reported results in this repo, but I think maybe I can perfectly reproduce the results you reported if considering the unrecognized arguments. My question is: How can I reproduce the results which are closer to your reported results? Is there something wrong with my operation?

My environment is as follows:
torch                  1.3.0
torchvision            0.4.1
Python 3.7.12
CUDA 10.2

Thank you in advance.

Oct 20 '22 13:10 lqzhao

Hi, sorry that was an old run script that seemed to have survived the code clean up. Can you try using

https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/void/run_kbnet_nyu_v2.sh

instead?

I will delete

https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/void/run_knet_nyu_v2_test.sh

Oct 20 '22 16:10 alexklwong

Hi, thanks for your quick reply. I used the https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/void/run_kbnet_nyu_v2.sh.

However, the results seem to be the same.

(kbnet) zlq@ivg-SYS-7048GR-TR:/home/disk2/code/calibrated-backprojection-network$ bash bash/void/run_kbnet_nyu_v2.sh Input paths: testing/nyu_v2/nyu_v2_test_image_corner.txt testing/nyu_v2/nyu_v2_test_sparse_depth_corner.txt testing/nyu_v2/nyu_v2_test_intrinsics_corner.txt testing/nyu_v2/nyu_v2_test_ground_truth_corner.txt

Input settings: input_channels_image=3 input_channels_depth=2 normalized_image_range=[0.0, 1.0] outlier_removal_kernel_size=7 outlier_removal_threshold=1.50

Sparse to dense pooling settings: min_pool_sizes_sparse_to_dense_pool=[15, 17] max_pool_sizes_sparse_to_dense_pool=[23, 27, 29] n_convolution_sparse_to_dense_pool=3 n_filter_sparse_to_dense_pool=8

Depth network settings: n_filters_encoder_image=[48, 96, 192, 384, 384] n_filters_encoder_depth=[16, 32, 64, 128, 128] resolutions_backprojection=[0, 1, 2, 3] n_filters_decoder=[256, 128, 128, 64, 12] deconv_type=up min_predict_depth=0.10 max_predict_depth=8.00

Weight settings: n_parameter=6957764 n_parameter_depth=6957764 n_parameter_pose=0 weight_initializer=xavier_normal activation_func=leaky_relu

Evaluation settings: min_evaluate_depth=0.20 max_evaluate_depth=5.00

Checkpoint settings: checkpoint_path=pretrained_models/void/evaluation_results/nyu_v2

depth_model_restore_path=pretrained_models/void/kbnet-void1500.pth

Hardware settings: device=cuda n_thread=1

Evaluation results: MAE RMSE iMAE iRMSE 122.836 228.426 24.147 50.003 +/- +/- +/- +/- 71.550 130.133 16.920 36.531 Total time: 9457.15 ms Average time per sample: 14.46 ms Saving outputs to pretrained_models/void/evaluation_results/nyu_v2/outputs

Oct 20 '22 16:10 lqzhao

Sorry I’m in Tel Aviv for ECCV this week so replies may be slow. Let me set up the repo and try.

Oct 21 '22 16:10 alexklwong

Yeah, have a good week!

I'm trying to follow your work and build our model. Besides VOID and KITTI, I notice that the authors also reported the performance of KBNet on NYUv2, and could you share the checkpoint trained on NYUv2 when you have time?

For now, KBNet is the SOTA method in this field, so KBNet will undoubtedly be our main baseline model. I would really appreciate it if you could share the pre-trained model on NYUv2.

Oct 24 '22 08:10 lqzhao

This is quite strange. I setup the repo from scratch and ran the void-1500 checkpoint on VOID1500 and NYUv2 test sets:

For VOID1500

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
  39.803    95.864    21.161    49.723
     +/-       +/-       +/-       +/-
  27.521    67.776    24.340    62.204
Total time: 10159.95 ms  Average time per sample: 12.70 ms

For NYUv2

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
 119.942   223.310    23.569    48.905
     +/-       +/-       +/-       +/-
  69.832   128.695    16.550    35.989
Total time: 7262.26 ms  Average time per sample: 11.10 ms

which is different from what you've obtained, and also different from the paper. Since NYUv2 as originally processed in MATLAB, I think there might be some processes tied to that platform that is causing the difference in the numbers. The differences are small so this might be some noise introduced in the data processing.

Oct 24 '22 11:10 alexklwong

calibrated-backprojection-network calibrated-backprojection-network copied to clipboard

NYU performance

calibrated-backprojection-network
calibrated-backprojection-network copied to clipboard