Mask_RCNN
Mask_RCNN copied to clipboard
loss:nan
hi, i have modified the backbone and its layer of mask RCNN algorithm.
i am getting theloss to be loss: nan
the description of the error is as below
Epoch 1/100
2020-02-13 03:32:27.707341: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-13 03:32:29.183377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-13 03:32:35.797619: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-02-13 03:32:35.854928: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
99/100 [============================>.] - ETA: 1s - loss: nan - rpn_class_loss: 2701735.7973 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6861 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00/usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2197: UserWarning: Using a generator with use_multiprocessing=True
and multiple workers may duplicate your data. Please consider using thekeras.utils.Sequence class. UserWarning('Using a generator with
use_multiprocessing=True`'
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:791: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.
100/100 [==============================] - 337s 3s/step - loss: nan - rpn_class_loss: 2674718.4462 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6862 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 2/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 3/100 100/100 [==============================] - 127s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 4/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 5/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 6/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 7/100 100/100 [==============================] - 127s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 8/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 9/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 10/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 11/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 12/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 13/100 100/100 [==============================] - 127s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 14/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 15/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 16/100 100/100 [==============================] - 126s 1s/step - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: nan - val_mrcnn_class_loss: 0.6931 - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00 Epoch 17/100 6/100 [>.............................] - ETA: 1:36 - loss: nan - rpn_class_loss: 0.6931 - rpn_bbox_loss: nan - mrcnn_class_loss: 0.6931 - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00ERROR:root:Error processing image {'id': '20181221154818131_0003.jpg.jpg', 'source': 'NamePlate', 'path': 'ROI4/train/20181221154818131_0003.jpg.jpg', 'width': 1024, 'height': 1024, 'polygons': [{'name': 'polygon', 'all_points_x': [615, 615, 1019, 1025, 621, 615], 'all_points_y': [902, 902, 898, 1009, 1015, 902]}]} Traceback (most recent call last):
Can anyone give me some suggestion about how to solve that? Many thanks!
It's very interesting, cause I have the same issue with the same 0.6931 value in output. Have you solved this problem?
having the same here, anyone has an idea about it?
having the same here, anyone has an idea about it?
My solution was, that my (grayscale) pictures had 16bit depth. I have converted them into 8bit and that helped.
on my case, learning rate was too high, it's working better now with lr~0.0001
on my case, learning rate was too high, it's working better now with lr~0.0001
In my case decreasing of lr didn’t help. The final results were the same (0.62), but they were approached later. Do you have multi class network?
nope, just one class (plus BG)
Please, I'm looking for The line that displays:
Epoch 1/100
821/1000 [=======================>......] - ETA: 3:35 - loss: 1.0181 ..
@amuthalingeswaranbose hey are you able to resolve it can you please let us known hear whats the issue with it?? i am facing same problem
I have the same problem.. // My Epoch is 400. After 60th epoch, it show me 'loss=nan'.
##maybe..It is overfitting
have your solved this problem?please give me some advice,thank you very much~