Mask_RCNN icon indicating copy to clipboard operation
Mask_RCNN copied to clipboard

There aren't mrcnn_bbox losses

Open Armando123x opened this issue 3 years ago • 0 comments

Hello! I'm training my custom dataset for object detection but when I've been training my model, I can see several problems (loss nan and without layers losses):

Starting at epoch 0. LR=0.0001
Checkpoint Path: ./dataset20220824T2348\mask_rcnn_dataset_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
rpn_model              (Functional)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
Epoch 1/100
100/100 [==============================] - 186s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 2/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 3/100
100/100 [==============================] - 165s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 4/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 5/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 6/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 7/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 8/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 9/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 10/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 11/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 12/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 13/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 14/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 15/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 16/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 17/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 18/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 19/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 20/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 21/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 22/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 23/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 24/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 25/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 26/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 27/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 28/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 29/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 30/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 31/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 32/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 33/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 34/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 35/100
100/100 [==============================] - 178s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 36/100
100/100 [==============================] - 178s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 37/100
100/100 [==============================] - 177s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 38/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 39/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 40/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 41/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 42/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 43/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 44/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 45/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 46/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 47/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 48/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 49/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 50/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 51/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 52/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 53/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 54/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 55/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 56/100
100/100 [==============================] - 176s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 57/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan

Config of train and val dataset is below:

class MeteorConfig(Config):
    # Give the configuration a recognizable name
    LEARNING_RATE = 1e-4
    
    IMAGE_RESIZE_MODE = "pad64"
    BACKBONE_STRIDES = [ 16, 32, 64,128,256]
    IMAGE_MIN_DIM = 640
    GPU_COUNT = 1
    
    def __init__(self, num_classes):
        self.NUM_CLASSES = num_classes
        self.NAME = "dataset"
        self.STEPS_PER_EPOCH = 50*2
        self.ETF_C=self.NUM_CLASSES
        super().__init__()
        

    
 
# prepare config
config = MeteorConfig(num_classes=4+1)

model = MaskRCNN(mode='training', model_dir='./', config=config)

model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_mask"])
model.train(train, test_set, learning_rate=config.LEARNING_RATE, epochs=100, layers='heads')

Armando123x avatar Aug 25 '22 21:08 Armando123x