Mask_RCNN
Mask_RCNN copied to clipboard
step - loss: nan - val_loss: nan in the trainin
Hello, I'm trying to train the mask rcnn network with multi classes, but I only get step - loss: nan - val_loss: nan in all training epochs. I've already used different values for different parameters, I don't know what else to do. Does anyone have any ideas?
`Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 1
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 640
IMAGE_META_SIZE 18
IMAGE_MIN_DIM 640
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [640 640 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 1e-05
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 50
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME weedS_detection
NUM_CLASSES 6
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 500
POST_NMS_ROIS_TRAINING 1000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (8, 16, 32, 64, 128)
RPN_ANCHOR_STRIDE [1]
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 128
STEPS_PER_EPOCH 100
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 32
USE_MINI_MASK False
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001`
i got the same error while training the model with gpu. what's the solution?
Check your annotations whether there is undefined values or images that are not annotated at all...it happens when there are missing annotations.