Retinanet-Tutorial icon indicating copy to clipboard operation
Retinanet-Tutorial copied to clipboard

Unable to test the model

Open VidhiVijayvergiya opened this issue 4 years ago • 8 comments

Hello Jasper,

I followed everything along however am still facing an error in test.py file. Even after loading the inference.h5 file, no predictions are made and the following is what I am getting.

Error: No training configuration found in the save file, so the model was not compiled. Compile it manually. Model is likely already an inference model

VidhiVijayvergiya avatar Oct 02 '20 14:10 VidhiVijayvergiya

did the keras_retinanet/bin/convert_model.py script complete properly?

jaspereb avatar Oct 08 '20 05:10 jaspereb

Yes it did. Also when I was training my model I mentioned 1000 steps, it ran only till 300 only (I thought it is because I have 300 images in my dataset)

VidhiVijayvergiya avatar Oct 10 '20 04:10 VidhiVijayvergiya

the error about training configuration means that the model can't be used for more training (fine tuning from that checkpoint) but that should be fine for inferencing. So likely that your model just isn't detecting any objects in the images you tried. Have you tried passing it one of the training set images? Are you using your own dataset?

jaspereb avatar Oct 13 '20 02:10 jaspereb

Yes, I am training on my own dataset and I am trying to do multilabel classification. I did try passing it one of the images but since the model is not trained properly, it is giving out the same image itself without any predictions.

When I used the pascal format, it wasn't showing any error in training but was later showing the error as mentioned in the first thread.

Later I tried to use the CSV format for training and the error is:

Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 3110 batches). You may need to use the repeat() function when building your dataset.

I am unable to understand this because I guess I followed all the prior steps in the right way. Would be glad if you could help me.

VidhiVijayvergiya avatar Oct 14 '20 16:10 VidhiVijayvergiya

I was getting the same error. In the file train.py (path: keras-retinanet/keras_retinanet/bin/train.py),in line 434, I changed the value of defualt to None and it resolved the error.

Vinaya19 avatar Dec 20 '20 04:12 Vinaya19

Yes, I am training on my own dataset and I am trying to do multilabel classification. I did try passing it one of the images but since the model is not trained properly, it is giving out the same image itself without any predictions.

When I used the pascal format, it wasn't showing any error in training but was later showing the error as mentioned in the first thread.

Later I tried to use the CSV format for training and the error is:

Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 3110 batches). You may need to use the repeat() function when building your dataset.

I am unable to understand this because I guess I followed all the prior steps in the right way. Would be glad if you could help me.

Apart from the solution of @Vinaya19 , insert --steps xyz in the training command, where xyz equals the numbers of training images / batch-size. If your batch-size is 1 then xyz=no. of training images only.

hlmhlr avatar Jan 13 '21 17:01 hlmhlr

Hi @hlmhlr , according to this video https://youtu.be/mr8Y_Nuxciw by @jaspereb . He used 15 data, 11 for training and 4 for validation. But the --steps parameter is still 100. So should it be 11 because he has 11 training data?

Because I'm also getting the same error after following step by step from the video

WARNING:tensorflow:`batch_size` is no longer needed in the `TensorBoard` Callback and will be ignored in TensorFlow 2.0.
keras_retinanet/bin/train.py:538: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  return training_model.fit_generator(
Epoch 1/50
2022-06-23 20:28:29.325827: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8200
2022-06-23 20:28:29.504615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
 11/100 [==>...........................] - ETA: 22s - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Running network: 100% (4 of 4) |########################################################################################################################| Elapsed Time: 0:00:01 Time:  0:00:01
Parsing annotations: 100% (4 of 4) |####################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
20 instances of class redPlum with average precision: 0.0000
12 instances of class greenPlum with average precision: 0.0000
mAP: 0.0000

Epoch 1: saving model to /home/mfahmirukman/RetinanetTutorial/TrainingOutput/snapshots/resnet50_pascal_01.h5
100/100 [==============================] - 12s 56ms/step - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288 - mAP: 0.0000e+00 - lr: 1.0000e-05

mfahmirukman avatar Jun 23 '22 13:06 mfahmirukman

Hi @mfahmirukman,

Hi @hlmhlr , according to this video https://youtu.be/mr8Y_Nuxciw by @jaspereb . He used 15 data, 11 for training and 4 for validation. But the --steps parameter is still 100. So should it be 11 because he has 11 training data?

Because I'm also getting the same error after following step by step from the video

WARNING:tensorflow:`batch_size` is no longer needed in the `TensorBoard` Callback and will be ignored in TensorFlow 2.0.
keras_retinanet/bin/train.py:538: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  return training_model.fit_generator(
Epoch 1/50
2022-06-23 20:28:29.325827: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8200
2022-06-23 20:28:29.504615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
 11/100 [==>...........................] - ETA: 22s - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Running network: 100% (4 of 4) |########################################################################################################################| Elapsed Time: 0:00:01 Time:  0:00:01
Parsing annotations: 100% (4 of 4) |####################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
20 instances of class redPlum with average precision: 0.0000
12 instances of class greenPlum with average precision: 0.0000
mAP: 0.0000

Epoch 1: saving model to /home/mfahmirukman/RetinanetTutorial/TrainingOutput/snapshots/resnet50_pascal_01.h5
100/100 [==============================] - 12s 56ms/step - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288 - mAP: 0.0000e+00 - lr: 1.0000e-05

Hi @mfahmirukman, Yes it should be 11 if your batch size is 1. I hope it should work for you.

hlmhlr avatar Jul 04 '22 11:07 hlmhlr