Mask_RCNN icon indicating copy to clipboard operation
Mask_RCNN copied to clipboard

Training stuck at epoch 1, Tensorflow and Keras verions both 2.12.0

Open AMELIZAHRA opened this issue 2 years ago • 12 comments

Please help!

AMELIZAHRA avatar Apr 05 '23 13:04 AMELIZAHRA

Me too. Can anyone provide hints?

jw-redpanda avatar Apr 25 '23 07:04 jw-redpanda

Same here. The training runs for one epoch and nothing happens not even error.

can anyone help?

avinash-218 avatar May 23 '23 11:05 avinash-218

Is happening the same here. I get stuck on the first epoch.

Checkpoint Path: /content/Mask_RCNN_master_Clean_Colab/logs/object20230526T2334/mask_rcnn_object_{epoch:04d}.h5 Selecting layers to train fpn_c5p5 (Conv2D) fpn_c4p4 (Conv2D) fpn_c3p3 (Conv2D) fpn_c2p2 (Conv2D) fpn_p5 (Conv2D) fpn_p2 (Conv2D) fpn_p3 (Conv2D) fpn_p4 (Conv2D) rpn_model (Functional) mrcnn_mask_conv1 (TimeDistributed) mrcnn_mask_bn1 (TimeDistributed) mrcnn_mask_conv2 (TimeDistributed) mrcnn_mask_bn2 (TimeDistributed) mrcnn_class_conv1 (TimeDistributed) mrcnn_class_bn1 (TimeDistributed) mrcnn_mask_conv3 (TimeDistributed) mrcnn_mask_bn3 (TimeDistributed) mrcnn_class_conv2 (TimeDistributed) mrcnn_class_bn2 (TimeDistributed) mrcnn_mask_conv4 (TimeDistributed) mrcnn_mask_bn4 (TimeDistributed) mrcnn_bbox_fc (TimeDistributed) mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) Epoch 1/5

sam125 avatar May 26 '23 23:05 sam125

I think tensorflow is only using the CPU insted of the GPU

sam125 avatar May 26 '23 23:05 sam125

Hello, I gave up using this code. It doesn't work with tensorflow 2.

AMELIZAHRA avatar May 27 '23 03:05 AMELIZAHRA

yeah it only goes well with tensorflow 1.x

Yanglc0123 avatar Jul 24 '23 08:07 Yanglc0123

Hi guys, do you mind sharing your colab code ? I runs alot of compatibility issue (tensorflow and python version related) when using colab and was wondering if anyone could provide me the code before training the model

oh btw I found this fork repository that supports tensorflow 2.x version in another post, https://github.com/leekunhee/Mask_RCNN In case someone have tried it, pls elaborate 😆

MuhAndar avatar Aug 10 '23 09:08 MuhAndar

Hi guys, do you mind sharing your colab code ? I runs alot of compatibility issue (tensorflow and python version related) when using colab and was wondering if anyone could provide me the code before training the model

oh btw I found this fork repository that supports tensorflow 2.x version in another post, https://github.com/leekunhee/Mask_RCNN In case someone have tried it, pls elaborate 😆

Here is my repo, https://github.com/avinash-218/Mask-RCNN-TF2.7.0-keras2.7.0

avinash-218 avatar Aug 10 '23 10:08 avinash-218

May there will be any error..try to debug or if you are running on small GPU or CPU it takes time and not to worry.

dayana123456789 avatar Sep 15 '23 15:09 dayana123456789

Was anyone able to resolve this issue? Is there a way to see current iteration number within an epoch to see if the training is actually going on?

SambitPrabhu avatar Nov 05 '23 22:11 SambitPrabhu

Was anyone able to resolve this issue? Is there a way to see current iteration number within an epoch to see if the training is actually going on?

Mask RCNN will be showing the training epochs and the steps per epochs during training. Are you running on CPU or small GPU. It's depends on the size of your dataset and annotations. moreover try to find if there is any error in your code or anything wrong in your annotations and recheck the regions in initialised in your code, if (this may be also the reason).

dayana123456789 avatar Nov 06 '23 05:11 dayana123456789

Was anyone able to resolve this issue? Is there a way to see current iteration number within an epoch to see if the training is actually going on?

Try removing the multiprocessing. That would help.

avinash-218 avatar Nov 06 '23 06:11 avinash-218