dynconv icon indicating copy to clipboard operation
dynconv copied to clipboard

A question about soft-mask calculation

Open peter943 opened this issue 5 years ago • 2 comments

Wonderful job!I studied your paper and code these days, which is very enlightening to me.

I have a question about the code to calculate the soft-mask by soft = self.maskconv(x). I'm not quite sure what the reason for choosing this (conv+fc) network to calculate the soft-mask. Thank you for your kind help.

peter943 avatar Jun 17 '20 02:06 peter943

Hi, thanks for your interest in our work! I suppose you are referring to the classification code. In classification, we use the same mask unit as the work we compare with (SACT). The mask unit is shown in Figure 6 of their paper ( https://arxiv.org/abs/1612.02297 ). It uses a convolution combined with a global average pooling to capture image context. We refer to this as "squeeze unit" as it resembles a squeeze operation of Squeezenet.

However, on pose estimation we noticed the squeeze unit had a significant performance hit (even though the amount of FLOPS is negligible). Table 2 in our paper ( https://arxiv.org/pdf/1912.03203.pdf ) compares the accuracy and inference speed of a simple 1x1 convolution and the squeeze unit. The 1x1 convolution results in slightly lower accuracy but faster inference.

Therefore: Classification experiments -> Squeeze unit for accuracy reasons Pose estimation -> 1x1 convolution for inference speed reasons

thomasverelst avatar Jun 17 '20 10:06 thomasverelst

@thomasverelst Thank you for your patient reply. It helps me a lot.

peter943 avatar Jun 18 '20 12:06 peter943