Mask_RCNN Error in PredictCost for the op: op "CropAndResize"

Error in PredictCost for the op: op "CropAndResize"

Open marul12q opened this issue 3 years ago • 0 comments

I wanted to train Mask RCNN on google Colab on GPU. And this Error/Warning comes in. The same code runs on CPU without any issues. Tensorflow 2.2.0. Keras 2.8.0. Does it have influence on training performance? Any suggestions?

Configurations: BACKBONE resnet50 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.9 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (84, 84) NAME gland NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 100 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001

Downloading pretrained model to /content/projektBadawczy/mask_rcnn_coco.h5 ... ... done downloading pretrained model! Loading weights /content/projektBadawczy/mask_rcnn_coco.h5 2022-05-29 06:40:47.457746: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. Training network heads

Starting at epoch 0. LR=0.001

Checkpoint Path: /content/projektBadawczy/logs/gland20220529T0640/mask_rcnn_gland_{epoch:04d}.h5 Selecting layers to train fpn_c5p5 (Conv2D) fpn_c4p4 (Conv2D) fpn_c3p3 (Conv2D) fpn_c2p2 (Conv2D) fpn_p5 (Conv2D) fpn_p2 (Conv2D) fpn_p3 (Conv2D) fpn_p4 (Conv2D) rpn_model (Functional) mrcnn_mask_conv1 (TimeDistributed) mrcnn_mask_bn1 (TimeDistributed) mrcnn_mask_conv2 (TimeDistributed) mrcnn_mask_bn2 (TimeDistributed) mrcnn_class_conv1 (TimeDistributed) mrcnn_class_bn1 (TimeDistributed) mrcnn_mask_conv3 (TimeDistributed) mrcnn_mask_bn3 (TimeDistributed) mrcnn_class_conv2 (TimeDistributed) mrcnn_class_bn2 (TimeDistributed) mrcnn_mask_conv4 (TimeDistributed) mrcnn_mask_bn4 (TimeDistributed) mrcnn_bbox_fc (TimeDistributed) mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:102: UserWarning: The lr argument is deprecated, use learning_rate instead. super(SGD, self).init(name, **kwargs) Epoch 1/30 /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:446: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/sub:0", shape=(None,), dtype=int32), values=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/GatherV2_2:0", shape=(None, 7, 7, 256), dtype=float32), dense_shape=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/Shape:0", shape=(4,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value 2022-05-29 06:44:01.503850: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -438 } dim { size: 84 } dim { size: 84 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -25 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } value { dtype: DT_INT32 tensor_shape { dim { size: 2 } } int_val: 28 } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11010" } environment { key: "cudnn" value: "8005" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14465892352 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 28 } dim { size: 28 } dim { size: 1 } } } 100/100 [==============================] - 168s 1s/step - batch: 49.5000 - size: 1.0000 - loss: 5.7086 - rpn_class_loss: 0.4854 - rpn_bbox_loss: 2.6237 - mrcnn_class_loss: 0.4647 - mrcnn_bbox_loss: 1.4705 - mrcnn_mask_loss: 0.6643 - val_loss: 5.5708 - val_rpn_class_loss: 0.3285 - val_rpn_bbox_loss: 2.2625 - val_mrcnn_class_loss: 0.8341 - val_mrcnn_bbox_loss: 1.4832 - val_mrcnn_mask_loss: 0.6625

May 29 '22 07:05 marul12q

Mask_RCNN Mask_RCNN copied to clipboard

Error in PredictCost for the op: op "CropAndResize"

Mask_RCNN
Mask_RCNN copied to clipboard