Mask_RCNN
Mask_RCNN copied to clipboard
There aren't mrcnn_bbox losses
Hello! I'm training my custom dataset for object detection but when I've been training my model, I can see several problems (loss nan and without layers losses):
Starting at epoch 0. LR=0.0001
Checkpoint Path: ./dataset20220824T2348\mask_rcnn_dataset_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5 (Conv2D)
fpn_c4p4 (Conv2D)
fpn_c3p3 (Conv2D)
fpn_c2p2 (Conv2D)
fpn_p5 (Conv2D)
fpn_p2 (Conv2D)
fpn_p3 (Conv2D)
fpn_p4 (Conv2D)
rpn_model (Functional)
mrcnn_mask_conv1 (TimeDistributed)
mrcnn_mask_bn1 (TimeDistributed)
mrcnn_class_conv1 (TimeDistributed)
mrcnn_class_bn1 (TimeDistributed)
mrcnn_mask_conv2 (TimeDistributed)
mrcnn_mask_bn2 (TimeDistributed)
mrcnn_class_conv2 (TimeDistributed)
mrcnn_class_bn2 (TimeDistributed)
mrcnn_mask_conv3 (TimeDistributed)
mrcnn_mask_bn3 (TimeDistributed)
mrcnn_bbox_fc (TimeDistributed)
mrcnn_mask_conv4 (TimeDistributed)
mrcnn_mask_bn4 (TimeDistributed)
mrcnn_mask_deconv (TimeDistributed)
mrcnn_class_logits (TimeDistributed)
mrcnn_mask (TimeDistributed)
Epoch 1/100
100/100 [==============================] - 186s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 2/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 3/100
100/100 [==============================] - 165s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 4/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 5/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 6/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 7/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 8/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 9/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 10/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 11/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 12/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 13/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 14/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 15/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 16/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 17/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 18/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 19/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 20/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 21/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 22/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 23/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 24/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 25/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 26/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 27/100
100/100 [==============================] - 168s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 28/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 29/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 30/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 31/100
100/100 [==============================] - 169s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 32/100
100/100 [==============================] - 167s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 33/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 34/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 35/100
100/100 [==============================] - 178s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 36/100
100/100 [==============================] - 178s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 37/100
100/100 [==============================] - 177s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 38/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 39/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 40/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 41/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 42/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 43/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 44/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 45/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 46/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 47/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 48/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 49/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 50/100
100/100 [==============================] - 170s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 51/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 52/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 53/100
100/100 [==============================] - 172s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 54/100
100/100 [==============================] - 171s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 55/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 56/100
100/100 [==============================] - 176s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Epoch 57/100
100/100 [==============================] - 173s 2s/step - batch: 49.5000 - size: 2.0000 - loss: nan - val_loss: nan
Config of train and val dataset is below:
class MeteorConfig(Config):
# Give the configuration a recognizable name
LEARNING_RATE = 1e-4
IMAGE_RESIZE_MODE = "pad64"
BACKBONE_STRIDES = [ 16, 32, 64,128,256]
IMAGE_MIN_DIM = 640
GPU_COUNT = 1
def __init__(self, num_classes):
self.NUM_CLASSES = num_classes
self.NAME = "dataset"
self.STEPS_PER_EPOCH = 50*2
self.ETF_C=self.NUM_CLASSES
super().__init__()
# prepare config
config = MeteorConfig(num_classes=4+1)
model = MaskRCNN(mode='training', model_dir='./', config=config)
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_mask"])
model.train(train, test_set, learning_rate=config.LEARNING_RATE, epochs=100, layers='heads')