Retinanet-Tutorial
Retinanet-Tutorial copied to clipboard
Unable to test the model
Hello Jasper,
I followed everything along however am still facing an error in test.py file. Even after loading the inference.h5 file, no predictions are made and the following is what I am getting.
Error: No training configuration found in the save file, so the model was not compiled. Compile it manually. Model is likely already an inference model
did the keras_retinanet/bin/convert_model.py
script complete properly?
Yes it did. Also when I was training my model I mentioned 1000 steps, it ran only till 300 only (I thought it is because I have 300 images in my dataset)
the error about training configuration means that the model can't be used for more training (fine tuning from that checkpoint) but that should be fine for inferencing. So likely that your model just isn't detecting any objects in the images you tried. Have you tried passing it one of the training set images? Are you using your own dataset?
Yes, I am training on my own dataset and I am trying to do multilabel classification. I did try passing it one of the images but since the model is not trained properly, it is giving out the same image itself without any predictions.
When I used the pascal format, it wasn't showing any error in training but was later showing the error as mentioned in the first thread.
Later I tried to use the CSV format for training and the error is:
Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs
batches (in this case, 3110 batches). You may need to use the repeat() function when building your dataset.
I am unable to understand this because I guess I followed all the prior steps in the right way. Would be glad if you could help me.
I was getting the same error. In the file train.py (path: keras-retinanet/keras_retinanet/bin/train.py),in line 434, I changed the value of defualt to None and it resolved the error.
Yes, I am training on my own dataset and I am trying to do multilabel classification. I did try passing it one of the images but since the model is not trained properly, it is giving out the same image itself without any predictions.
When I used the pascal format, it wasn't showing any error in training but was later showing the error as mentioned in the first thread.
Later I tried to use the CSV format for training and the error is:
Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs
batches (in this case, 3110 batches). You may need to use the repeat() function when building your dataset.I am unable to understand this because I guess I followed all the prior steps in the right way. Would be glad if you could help me.
Apart from the solution of @Vinaya19 , insert --steps xyz in the training command, where xyz equals the numbers of training images / batch-size. If your batch-size is 1 then xyz=no. of training images only.
Hi @hlmhlr , according to this video https://youtu.be/mr8Y_Nuxciw by @jaspereb . He used 15 data, 11 for training and 4 for validation. But the --steps
parameter is still 100. So should it be 11 because he has 11 training data?
Because I'm also getting the same error after following step by step from the video
WARNING:tensorflow:`batch_size` is no longer needed in the `TensorBoard` Callback and will be ignored in TensorFlow 2.0.
keras_retinanet/bin/train.py:538: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
return training_model.fit_generator(
Epoch 1/50
2022-06-23 20:28:29.325827: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8200
2022-06-23 20:28:29.504615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
11/100 [==>...........................] - ETA: 22s - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset.
Running network: 100% (4 of 4) |########################################################################################################################| Elapsed Time: 0:00:01 Time: 0:00:01
Parsing annotations: 100% (4 of 4) |####################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
20 instances of class redPlum with average precision: 0.0000
12 instances of class greenPlum with average precision: 0.0000
mAP: 0.0000
Epoch 1: saving model to /home/mfahmirukman/RetinanetTutorial/TrainingOutput/snapshots/resnet50_pascal_01.h5
100/100 [==============================] - 12s 56ms/step - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288 - mAP: 0.0000e+00 - lr: 1.0000e-05
Hi @mfahmirukman,
Hi @hlmhlr , according to this video https://youtu.be/mr8Y_Nuxciw by @jaspereb . He used 15 data, 11 for training and 4 for validation. But the
--steps
parameter is still 100. So should it be 11 because he has 11 training data?Because I'm also getting the same error after following step by step from the video
WARNING:tensorflow:`batch_size` is no longer needed in the `TensorBoard` Callback and will be ignored in TensorFlow 2.0. keras_retinanet/bin/train.py:538: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators. return training_model.fit_generator( Epoch 1/50 2022-06-23 20:28:29.325827: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8200 2022-06-23 20:28:29.504615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 11/100 [==>...........................] - ETA: 22s - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset. Running network: 100% (4 of 4) |########################################################################################################################| Elapsed Time: 0:00:01 Time: 0:00:01 Parsing annotations: 100% (4 of 4) |####################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00 20 instances of class redPlum with average precision: 0.0000 12 instances of class greenPlum with average precision: 0.0000 mAP: 0.0000 Epoch 1: saving model to /home/mfahmirukman/RetinanetTutorial/TrainingOutput/snapshots/resnet50_pascal_01.h5 100/100 [==============================] - 12s 56ms/step - loss: 3.9890 - regression_loss: 2.8602 - classification_loss: 1.1288 - mAP: 0.0000e+00 - lr: 1.0000e-05
Hi @mfahmirukman, Yes it should be 11 if your batch size is 1. I hope it should work for you.