Mask_RCNN icon indicating copy to clipboard operation
Mask_RCNN copied to clipboard

Very bad predictions

Open OguntolaIbrahim opened this issue 2 years ago • 7 comments

I am getting very poor training and validation results. I am trying to train on a custom dataset using coco_weights as a starting weight. There is only one instance of the object in every image. Is it possible my config parameters setting is not good?

IMAGES_PER_GPU = 1
TRAIN_ROIS_PER_IMAGE=300

NUM_CLASSES = 1 + 1 

STEPS_PER_EPOCH =100

DETECTION_MIN_CONFIDENCE = 0.95
VALIDATION_STEPS = 50
MAX_GT_INSTANCES = 1
DETECTION_MAX_INSTANCES = 1

USE_MINI_MASK = True

I set DETECTION_MAX_INSTANCES to 1 because there is only 1 object in all images. i have tried other configurations though with no result I also set MAX_GT_INSTANCES to 1 because all training images also have only one image, even though I have also tried large values with no result. I have also tried varying values for TRAIN_ROIS_PER_IMAGE I have set USE_MINI_MASK to both true and false. Also, I need to know what values would work best for IMAGE_MIN_DIM,RPN_ANCHOR_SCALES

Also, I suspected the TensorFlow version. Was obtaining poor results using Tensorflow 2.6.0. I then tested writing a basic program predicting a simple coco image using coco weights and was obtaining a result predicting 4+ bbs when there was only 1 object. This changes after downgrading to 2.5.0 and I was able to get an accurate prediction on the basic image. However, using Tensorflow 2.5.0, I am still getting poor training and validation results on my custom dataset. Please any advice would be appreciated image

OguntolaIbrahim avatar May 20 '22 09:05 OguntolaIbrahim

I got the same problem... Using tensorflow (2.6, 2.7, 2.8, 2.9), It works with tensorflow 2.5.3 and below. But I solve it by changing the line for load_weights()... (recommended to be change for training and inference)...

--> model.load_weights(str(checkpoint (.h5)), by_name=True) tf.keras.Model.load_weights(model.keras_model, str(checkpoint (.h5), by_name=True)

natha1008 avatar May 24 '22 06:05 natha1008

I believe the issue with load_weights is due to incompatibilities with h5py on peoples' systems. Try to forcefully reinstall h5py (the latest version works for me, if not, try h5py==2.9.0)

That solved it on my end

tqamarVT avatar Jul 07 '22 17:07 tqamarVT

Thank you very much. I figured out I wasn't using the load_weights at all. It was a silly mistake. So every time I re-ran the code, it started learning from scratch rather than using coco_weights or my previously saved h5py file. Its working fine now and I am even getting an IOU of about 0.95 I am very grateful for the support

OguntolaIbrahim avatar Jul 08 '22 08:07 OguntolaIbrahim

Thank you very much. I figured out I wasn't using the load_weights at all. It was a silly mistake. So every time I re-ran the code, it started learning from scratch rather than using coco_weights or my previously saved h5py file. Its working fine now and I am even getting an IOU of about 0.95 I am very grateful for the support

Glad it worked! Can you please mention the version details for tensorflow and cuda? I am still getting poor results, even after changing to older tensorflow versions like 1.15.

parthkvv avatar Oct 08 '22 06:10 parthkvv

Tensorflow 2.5.0 CUDA Version: 11.6

OguntolaIbrahim avatar Oct 13 '22 10:10 OguntolaIbrahim

I got the same problem... Using tensorflow (2.6, 2.7, 2.8, 2.9), It works with tensorflow 2.5.3 and below. But I solve it by changing the line for load_weights()... (recommended to be change for training and inference)...

--> model.load_weights(str(checkpoint (.h5)), by_name=True) tf.keras.Model.load_weights(model.keras_model, str(checkpoint (.h5), by_name=True)

Thank you! This problem confuses me for a long time...

caichangjia avatar Dec 15 '22 20:12 caichangjia

The ‘model.load_weights’ seem to load the weights incorrectly due to version compatibility issues, resulting in training from scratch. So during training and evaluation, the coco and the previously trained weights were not loaded properly and hence in case of training, the training happens from scratch and in case of evaluation the loaded model predicts worse on the sample data.

Because of this reason, the losses at earlier steps at earlier epochs were too high and also the visual results looked random, not even close to the ground truth, and also the evaluation metrics such as mAP, mAR, F1 were 0. This can be solved by two ways : By using ‘tf.keras.Model.load_weights’ instead of ‘model.load_weights’ - But still this can’t be used since it doesn’t support the ‘exclude’ argument.

By downgrading tensorflow from 2.7 to 2.5 worked in both training (from coco using exclude argument and from previously trained weights) and also in evaluations.

This worked for me. Correct me if i am wrong or my understanding is wrong

avinash-218 avatar Jul 05 '23 12:07 avinash-218