InterFuser Questions about details of the network

Hi,

Thank you for your great work! I have some questions about details of the network.

For CNN backbone, input images from different cameras are scaled to different size and cropped to different size (e.g. front: 800x600 => scale to 256x256 => crop to 224x224, left/right: 800x600 => scale to 160x160 => crop to 128x128, focus: 800x600 => crop to 128x128), are there any special reasons for different operations and choosing different size?
For backbone, in the paper "We set C = 2048 and (H;W) = (H0/32 ; W0/32 ) in experiments.", any special reason for choosing 2048 and dividing by 32?
After resnet, a convoluation is used to reduce channel from 2048 to 256, any special reason for choosing 256?

Thank you! :)

Apr 04 '23 15:04 zliu950219

Hi,

For the front view, we think it's the most important view, so we give it a largest size. For the side views, we take a smaller size to reduce the Flops of the network. For the focus view, we don't scale it and directly center-crop it to capture traffic light status at a distance.
C = 2048 and (H;W) = (H0/32 ; W0/32 ) is just at a Stage 4 of a standard Resnet. H or W/32 is a proper resolution for the following transformer encoder. A higher resolution would lead to an increase in the O(N^2) calculation of the transformer. A lower resolution would cause a large performance drop.
We have tried other choices (including 128, 256, 384) and found 256 channels got the best performance with fewer network parameters.

Apr 04 '23 16:04 deepcs233

Hi,

Thank you for you replay! :)

For the size (e.g. 256 or 128 or 160), do you try other size?
For higher and lower resolution rather than dividing by 32, do you do any experiments (e.g. using H/16 or H/64)?
I understand now. Thank you.

Thank you very much! :)

Apr 05 '23 09:04 zliu950219

Hi, we haven't tried other input size or different resolutions in our experiments. But H/16 will bring 4X tokens and 16X flops, which may make it difficult to train or inference.

Apr 05 '23 14:04 deepcs233

Thank you very much! :)

Apr 06 '23 08:04 zliu950219

InterFuser InterFuser copied to clipboard

Questions about details of the network

InterFuser
InterFuser copied to clipboard