bottom-up-attention
bottom-up-attention copied to clipboard
Quesntion about image size?
In file "models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt" there is a input data layer:
name: "ResNet-101"
input: "data"
input_shape {
dim: 1
dim: 3
dim: 224
dim: 224
}
But when i fit this model to my own data, it seems that the network does not crop and resize images and the raw images are put into network and the feature maps of the last convolution layer are thus not 14×14 but much bigger? Here is my question: Do you crop and resize pictures in Visual Genome to 224×224 to train this detection model? Do I need to crop and resize my own pictures to 224×224 to fit this detection model? Thx for your attention!
Also, please forgive me if my doubts are indicated in the paper.