YOLOF
YOLOF copied to clipboard
question for uniform matcher
according to your code, the uniform matcher seems calculate the L1 distance between pred_bbox/anchor with target among batch imgs. but i think it should be computed within single img. another question is that i do not understand the fusion method of the anchor indices and the pred_box indices, why simply add the two indices?https://github.com/megvii-model/YOLOF/blob/61a8accf957dceef11ea8029f121922b5f60901e/playground/detection/coco/yolof/yolof_base/uniform_matcher.py#L77
Hi, For the first question, the indexes are actually selected in each image. Using batch is an implementation to avoid loop calculation.
For the second question, you can refer to the answer here.
thanks for your answer! and i wonder whether it is suitable for light-weighted model such as yolov4tiny, and i am in experiment, the result is not well, i simply changed the backbone to the yolov4tiny's, the map only got 13.8 as the input size is 320. could you give any suggestions?
Hi, we did not train tiny models before. But I am happy to help get reasonable results.
Could you provide more details about your modification? The backbone file, pre-trained models, and config file will be helpful.
i simply change the backbone according to the yolov4tiny, and the anchor of 512 is deleted since the limited img size. btw, the activation is replaced by leakyrelu. other setting is the same as cpsdarknet53-dc5. class DarkNet(Backbone): """DarkNet backbone. Refer to the paper for more details: https://arxiv.org/pdf/1804.02767
Args:
depth (int): Depth of Darknet, from {53}.
num_stages (int): Darknet stages, normally 5.
with_csp (bool): Use cross stage partial connection or not.
out_features (List[str]): Output features.
norm_type (str): type of normalization layer.
res5_dilation (int): dilation for the last stage
"""
arch_settings = {
53: (DarkBlock, (1, 1, 1))
}
def __init__(self,
depth,
with_csp=True,
out_features=["res5"],
norm_type="BN",
res5_dilation=1):
super(DarkNet, self).__init__()
if depth not in self.arch_settings:
raise KeyError('invalid depth {} for resnet'.format(depth))
self.with_csp = with_csp
self._out_features = out_features
self.norm_type = norm_type
self.res5_dilation = res5_dilation
self.block, self.stage_blocks = self.arch_settings[depth]
self.inplanes = 64
self._make_stem_layer()
self.dark_layers = []
for i, num_blocks in enumerate(self.stage_blocks):
planes = 128 * 2 ** i
dilation = 1
stride = 2
if i == 4 and self.res5_dilation == 2:
dilation = self.res5_dilation
stride = 1
if not self.with_csp:
layer = make_dark_layer(
block=self.block,
inplanes=self.inplanes,
planes=planes,
num_blocks=num_blocks,
dilation=dilation,
stride=stride,
norm_type=self.norm_type
)
else:
layer = make_cspdark_layer(
block=self.block,
inplanes=self.inplanes,
planes=planes,
num_blocks=num_blocks,
is_csp_first_stage=True if i == 0 else False,
dilation=dilation,
norm_type=self.norm_type
)
layer = CrossStagePartialBlock(
self.inplanes,
planes,
stage_layers=layer,
is_csp_first_stage=True if i == 0 else False,
dilation=dilation,
stride=stride,
norm_type=self.norm_type
)
self.inplanes = planes
layer_name = 'layer{}'.format(i + 1)
self.add_module(layer_name, layer)
self.dark_layers.append(layer_name)
# freeze stage<=2
# for p in self.conv1.parameters():
# p.requires_grad = False
# for p in self.bn1.parameters():
# p.requires_grad = False
# for p in self.layer1.parameters():
# p.requires_grad = False
# for p in self.layer2.parameters():
# p.requires_grad = False
def _make_stem_layer(self):
self.conv1 = nn.Conv2d(
3,
32,
kernel_size=3,
stride=2,
padding=1,
bias=False
)
self.bn1 = get_norm(
self.norm_type, 32, eps=1e-4, momentum=0.03
)
# self.act1 = Mish()
self.act1 = LeakyReLU()
self.conv2 = nn.Conv2d(
32,
self.inplanes,
kernel_size=3,
stride=2,
padding=1,
bias=False
)
self.bn2 = get_norm(
self.norm_type, self.inplanes, eps=1e-4, momentum=0.03
)
self.act2 = LeakyReLU()
def forward(self, x):
outputs = {}
x = self.conv1(x)
x = self.bn1(x)
x = self.act1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.act2(x)
for i, layer_name in enumerate(self.dark_layers):
layer = getattr(self, layer_name)
x = layer(x)
outputs[self._out_features[-1]] = x
return outputs
def output_shape(self):
return {
"res3": ShapeSpec(
channels=512, stride=16 if self.res5_dilation == 2 else 32
)
}
Ok, I will try it.
Thx for your reply! Another question is that wheather the multi-scale training and swa are included?
Multi-scale training is supported by Detectron2. You can refer to this repo for swa.
The results for the multi-scale training and saw are not included in this repo. You can try them yourself.
thx a lot! i find that when i change the test img size from 608 to 320, the performance drops a lot. map drops from 43.2 to 34.5. The performance degradation is significant in small and medium object (small object map drops from 22.8 to 11.8, medium object map drops from 47.2 to 36.4). compare to yolov4 with the input size of 320, the small object detection of yolof is not satisfying, is there any suggestions to improve is?
You may need to re-train YOLOF with small image sizes. The provided pre-train model is trained with relatively large image sizes (from 512 to 768), which is not suitable to test with image size 320.