vovnet-detectron2
vovnet-detectron2 copied to clipboard
Loss NaN about using vovnet as backbone in RetinaNet
Hi! Thank you for your great work. I wanted to improve RetinaNet project in detectron2/projects by replacing "retinanet_resnet_fpn_backbone" with "retinanet_vovnet_fpn_backbone". However, I always encounterd "loss NaN" in period of less than 1000 iterations during training . Training by "retinanet_resnet_fpn_backbone" is OK.
I want to make sure that I wasn't doing something wrong.
my config yaml:
_BASE_: "../Base-RetinaNet.yaml"
MODEL:
WEIGHTS: "./pre_train/vovnet39_ese_detectron2.pth"
RETINANET:
NUM_CLASSES: 2
BACKBONE:
NAME: "build_retinanet_vovnet_fpn_backbone"
FREEZE_AT: 0
VOVNET:
CONV_BODY : "V-39-eSE"
OUT_FEATURES: ["stage3", "stage4", "stage5"]
FPN:
IN_FEATURES: ["stage3", "stage4", "stage5"]
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000
OUTPUT_DIR: "output/retina/V_39_ms_3x"
build_retinanet_vovnet_fpn_backbone
@BACKBONE_REGISTRY.register()
def build_retinanet_vovnet_fpn_backbone(cfg, input_shape: ShapeSpec):
"""
Args:
cfg: a detectron2 CfgNode
Returns:
backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
"""
bottom_up = build_vovnet_backbone(cfg, input_shape)
in_features = cfg.MODEL.FPN.IN_FEATURES
out_channels = cfg.MODEL.FPN.OUT_CHANNELS
in_channels_top = out_channels
top_block = LastLevelP6P7(in_channels_top, out_channels, "p5")
# in_channels_p6p7 = bottom_up.output_shape()["res5"].channels
backbone = FPN(
bottom_up=bottom_up,
in_features=in_features,
out_channels=out_channels,
norm=cfg.MODEL.FPN.NORM,
top_block=top_block,
# top_block=LastLevelP6P7(in_channels_p6p7, out_channels),
fuse_type=cfg.MODEL.FPN.FUSE_TYPE,
)
return backbone
Nice copy LOL. By the way, I think it's because your learning rate is too big. I think you can try to lower it 10-100 times. And don't forget to longer your iteration.
Nice copy LOL. By the way, I think it's because your learning rate is too big. I think you can try to lower it 10-100 times. And don't forget to longer your iteration.
cut-and-pasted😂... I tried lower learning rate, I got loss without decreasing instead of loss explosion. I read vovNet paper, author didn't use vovNet to be backbone in any object detection network except RefineDet in experiments.
Same error, can't manage to fit a vovnet-lite-dw or a vovnet-19-dw, keep getting NaN loss. Vovnet-lite is fine tho, I have the feeling that there is something wrong with the depthwise convolution.
When I tested this kind of lightweight backbone in object detection (ex, mobilenet, shufflenet etc..), i set warm up iter longer.