AR-Net icon indicating copy to clipboard operation
AR-Net copied to clipboard

About the GFLOPs of the model.

Open blue-blue272 opened this issue 3 years ago • 1 comments

It is a wonderful work!

But I find that in your implementation, you feed every frame to all the backbone (resnet50; resnet43; resnet18), which is different from the proposed method in your paper. This implementation will bring too much computing overhead (about twice compared to baseline). And your paper states that your method is about 45% less computation over the baseline. Can you explain why the proposed method (in your implementation) will reduce the computation overhead?

image

blue-blue272 avatar Oct 09 '20 12:10 blue-blue272

Thanks for your showing interest to our paper. The code is for training only (though it can also do inference, in a less efficient way as you mentioned), and the GFLOPS are measured only for "necessary computation for inference". It should not be hard to implement efficient inference to achieve this GFLOPS: for each frame (if we decide to compute, rather than skipping), just need to select the proper backbone first and then do the inference. An implementation of fast inference of sparse indexing can be found at https://openaccess.thecvf.com/content_CVPR_2020/papers/Verelst_Dynamic_Convolutions_Exploiting_Spatial_Sparsity_for_Faster_Inference_CVPR_2020_paper.pdf (though their paper is about sparse convolution, not sparse frame selection, the implementation should be very similar)

mengyuest avatar Oct 09 '20 16:10 mengyuest