YOLOF icon indicating copy to clipboard operation
YOLOF copied to clipboard

question for uniform matcher

Open StephanPan opened this issue 4 years ago • 9 comments

according to your code, the uniform matcher seems calculate the L1 distance between pred_bbox/anchor with target among batch imgs. but i think it should be computed within single img. another question is that i do not understand the fusion method of the anchor indices and the pred_box indices, why simply add the two indices?https://github.com/megvii-model/YOLOF/blob/61a8accf957dceef11ea8029f121922b5f60901e/playground/detection/coco/yolof/yolof_base/uniform_matcher.py#L77

StephanPan avatar Apr 21 '21 06:04 StephanPan

Hi, For the first question, the indexes are actually selected in each image. Using batch is an implementation to avoid loop calculation.

For the second question, you can refer to the answer here.

chensnathan avatar Apr 21 '21 10:04 chensnathan

thanks for your answer! and i wonder whether it is suitable for light-weighted model such as yolov4tiny, and i am in experiment, the result is not well, i simply changed the backbone to the yolov4tiny's, the map only got 13.8 as the input size is 320. could you give any suggestions?

StephanPan avatar Apr 22 '21 02:04 StephanPan

Hi, we did not train tiny models before. But I am happy to help get reasonable results.

Could you provide more details about your modification? The backbone file, pre-trained models, and config file will be helpful.

chensnathan avatar Apr 22 '21 04:04 chensnathan

i simply change the backbone according to the yolov4tiny, and the anchor of 512 is deleted since the limited img size. btw, the activation is replaced by leakyrelu. other setting is the same as cpsdarknet53-dc5. class DarkNet(Backbone): """DarkNet backbone. Refer to the paper for more details: https://arxiv.org/pdf/1804.02767

Args:
    depth (int): Depth of Darknet, from {53}.
    num_stages (int): Darknet stages, normally 5.
    with_csp (bool): Use cross stage partial connection or not.
    out_features (List[str]): Output features.
    norm_type (str): type of normalization layer.
    res5_dilation (int): dilation for the last stage
"""

arch_settings = {
    53: (DarkBlock, (1, 1, 1))
}

def __init__(self,
             depth,
             with_csp=True,
             out_features=["res5"],
             norm_type="BN",
             res5_dilation=1):
    super(DarkNet, self).__init__()
    if depth not in self.arch_settings:
        raise KeyError('invalid depth {} for resnet'.format(depth))
    self.with_csp = with_csp
    self._out_features = out_features
    self.norm_type = norm_type
    self.res5_dilation = res5_dilation

    self.block, self.stage_blocks = self.arch_settings[depth]
    self.inplanes = 64

    self._make_stem_layer()

    self.dark_layers = []
    for i, num_blocks in enumerate(self.stage_blocks):
        planes = 128 * 2 ** i
        dilation = 1
        stride = 2
        if i == 4 and self.res5_dilation == 2:
            dilation = self.res5_dilation
            stride = 1
        if not self.with_csp:
            layer = make_dark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        else:
            layer = make_cspdark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                norm_type=self.norm_type
            )
            layer = CrossStagePartialBlock(
                self.inplanes,
                planes,
                stage_layers=layer,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        self.inplanes = planes
        layer_name = 'layer{}'.format(i + 1)
        self.add_module(layer_name, layer)
        self.dark_layers.append(layer_name)

    # freeze stage<=2
    # for p in self.conv1.parameters():
    #     p.requires_grad = False
    # for p in self.bn1.parameters():
    #     p.requires_grad = False
    # for p in self.layer1.parameters():
    #     p.requires_grad = False
    # for p in self.layer2.parameters():
    #     p.requires_grad = False

def _make_stem_layer(self):
    self.conv1 = nn.Conv2d(
        3,
        32,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn1 = get_norm(
        self.norm_type, 32, eps=1e-4, momentum=0.03
    )
    # self.act1 = Mish()
    self.act1 = LeakyReLU()

    self.conv2 = nn.Conv2d(
        32,
        self.inplanes,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn2 = get_norm(
        self.norm_type, self.inplanes, eps=1e-4, momentum=0.03
    )
    self.act2 = LeakyReLU()

def forward(self, x):
    outputs = {}
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.act1(x)
    x = self.conv2(x)
    x = self.bn2(x)
    x = self.act2(x)

    for i, layer_name in enumerate(self.dark_layers):
        layer = getattr(self, layer_name)
        x = layer(x)
    outputs[self._out_features[-1]] = x
    return outputs

def output_shape(self):
    return {
        "res3": ShapeSpec(
            channels=512, stride=16 if self.res5_dilation == 2 else 32
        )
    }

StephanPan avatar Apr 22 '21 10:04 StephanPan

Ok, I will try it.

chensnathan avatar Apr 23 '21 02:04 chensnathan

Thx for your reply! Another question is that wheather the multi-scale training and swa are included?

StephanPan avatar Apr 25 '21 07:04 StephanPan

Multi-scale training is supported by Detectron2. You can refer to this repo for swa.

The results for the multi-scale training and saw are not included in this repo. You can try them yourself.

chensnathan avatar Apr 26 '21 02:04 chensnathan

thx a lot! i find that when i change the test img size from 608 to 320, the performance drops a lot. map drops from 43.2 to 34.5. The performance degradation is significant in small and medium object (small object map drops from 22.8 to 11.8, medium object map drops from 47.2 to 36.4). compare to yolov4 with the input size of 320, the small object detection of yolof is not satisfying, is there any suggestions to improve is?

StephanPan avatar Apr 26 '21 08:04 StephanPan

You may need to re-train YOLOF with small image sizes. The provided pre-train model is trained with relatively large image sizes (from 512 to 768), which is not suitable to test with image size 320.

chensnathan avatar Apr 26 '21 09:04 chensnathan