Mask_RCNN
Mask_RCNN copied to clipboard
Memory usage increasing with every epoch
Issue Has anyone faced anything similar to this? While training, system memory i.e ram and swap memory consumption increases with each epoch. It keeps increasing until it is out of memory and then gives out memory error and exits. for eg
Epoch Ram Gb Swap Gb
30 80 2
70 110 2
150 124 30
Is this to be expected or do we need to change something here?
GPU - Quadro RTX 8000 (48 Gb) System memory - 128 Gb Swap memory - 70Gb ` Config :
GPU_COUNT = 1
IMAGES_PER_GPU = 18
STEPS_PER_EPOCH = 1000
NUM_CLASSES = 1 + 100 # Override in sub-classes
LEARNING_RATE = 0.005
LEARNING_MOMENTUM = 0.9
VALIDATION_STEPS = 50
IMAGE_MIN_DIM = 512
IMAGE_MAX_DIM = 512
WEIGHT_DECAY = 0.01
GRADIENT_CLIP_NORM = 5.0
BACKBONE = "resnet101"
COMPUTE_BACKBONE_SHAPE = None
BACKBONE_STRIDES = [4, 8, 16, 32, 64]
FPN_CLASSIF_FC_LAYERS_SIZE = 1024
TOP_DOWN_PYRAMID_SIZE = 256
RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
RPN_ANCHOR_RATIOS = [0.5, 1, 2]
RPN_ANCHOR_STRIDE = 1
RPN_NMS_THRESHOLD = 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE = 256
PRE_NMS_LIMIT = 6000
POST_NMS_ROIS_TRAINING = 2000
POST_NMS_ROIS_INFERENCE = 1000
USE_MINI_MASK = True
MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask
IMAGE_MIN_SCALE = 0
IMAGE_CHANNEL_COUNT = 3
MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
TRAIN_ROIS_PER_IMAGE = 200
ROI_POSITIVE_RATIO = 0.33
POOL_SIZE = 7
MASK_POOL_SIZE = 14
MASK_SHAPE = [28, 28]
MAX_GT_INSTANCES = 40\
DETECTION_MAX_INSTANCES = 100
RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
USE_RPN_ROIS = True
TRAIN_BN = False
MULTI_PROCESSING = "True"
WEIGHT = "coco"
LAYERS = "3+"
`