ImageAI Evaluating models outputs `mAP : 0.0000`

Hello,

I'm trying to evaluate models I've made using transfer learning from a docker container. I actually launched it with 200 experiments by mistake. However I'm trying to evaluate the first models.

cloud@serveur-cic-tempo:~/imageaitest2$ ls -la dataset/models/
total 2412288
drwxr-xr-x 2 cloud cloud      4096 Feb  4 09:14 .
drwxr-xr-x 8 cloud cloud      4096 Feb  4 09:14 ..
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-001--loss-0012.120.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-001--loss-0019.993.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-002--loss-0010.218.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-002--loss-0015.954.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-003--loss-0009.916.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-003--loss-0015.790.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-004--loss-0009.739.h5

Could it be my issue here?

However, I believe the training didn't work since mAP is 0.0000. I don't really understand what could be my issue here. Any clue please?

Below, you can find my config, the steps I followed and the outputs.

Configuration :

cloud@serveur-cic-tempo:~/imageaitest2$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       42 bits physical, 48 bits virtual
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping:            7
CPU MHz:             3000.000
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            30976K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities

Docker Container from official Tensorflow image : https://hub.docker.com/r/tensorflow/tensorflow/

root@45ab5106701d:/# pip list
Package                Version
---------------------- ---------
absl-py                0.11.0
asn1crypto             0.24.0
astunparse             1.6.3
cachetools             4.2.0
certifi                2020.12.5
chardet                3.0.4
cryptography           2.1.4
cycler                 0.10.0
decorator              4.4.2
flatbuffers            1.12
gast                   0.3.3
google-auth            1.24.0
google-auth-oauthlib   0.4.2
google-pasta           0.2.0
grpcio                 1.32.0
h5py                   2.10.0
idna                   2.6
imageai                2.1.6
imageio                2.9.0
imgaug                 0.4.0
importlib-metadata     3.3.0
Keras                  2.4.3
Keras-Preprocessing    1.1.2
keras-resnet           0.2.0
keyring                10.6.0
keyrings.alt           3.0
kiwisolver             1.3.1
Markdown               3.3.3
matplotlib             3.3.2
networkx               2.5
numpy                  1.19.3
oauthlib               3.1.0
opencv-python          4.5.1.48
opt-einsum             3.3.0
pandas                 1.1.5
Pillow                 7.0.0
pip                    20.2.4
protobuf               3.14.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pycrypto               2.6.1
pygobject              3.26.1
pyparsing              2.4.7
python-dateutil        2.8.1
pytz                   2021.1
PyWavelets             1.1.1
pyxdg                  0.25
PyYAML                 5.4.1
requests               2.25.0
requests-oauthlib      1.3.0
rsa                    4.6
scikit-image           0.17.2
scipy                  1.4.1
SecretStorage          2.3.1
setuptools             51.0.0
Shapely                1.7.1
six                    1.15.0
tensorboard            2.4.0
tensorboard-plugin-wit 1.7.0
tensorflow             2.4.0
tensorflow-estimator   2.4.0rc0
termcolor              1.1.0
tifffile               2020.9.3
typing-extensions      3.7.4.3
urllib3                1.26.2
Werkzeug               1.0.1
wheel                  0.36.2
wrapt                  1.12.1
zipp                   3.4.0

training.py file and output below

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset/")
trainer.setTrainConfig(object_names_array=["truck","van"], batch_size=4, num_experiments=200, train_from_pretrained_model="/data/pretrained-yolov3.h5")
trainer.trainModel()

cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f 44e073940c3f
2021-02-03 12:54:27.034598: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-03 12:54:27.034629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-03 12:54:30.259959: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-03 12:54:30.260180: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-03 12:54:30.260198: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-03 12:54:30.260224: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (44e073940c3f): /proc/driver/nvidia/version does not exist
2021-02-03 12:54:30.260431: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-03 12:54:30.262570: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
2021-02-03 12:54:33.302708: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:33.302747: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:33.302799: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  warnings.warn('`Model.fit_generator` is deprecated and '
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3504: UserWarning: Even though the tf.config.experimental_run_functions_eagerly option is set, this option does not apply to tf.data functions. tf.data functions are still traced and executed as graphs.
  "Even though the tf.config.experimental_run_functions_eagerly "
WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... Layer YoloLayer has arguments in `__init__` and therefore must override `get_config`.
2021-02-03 12:54:33.434020: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-02-03 12:54:33.434470: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3000000000 Hz
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.83
Anchor Boxes generated.
Detection configuration saved in  /data/dataset/json/detection_config.json
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples  given at /data/dataset/train
Training on: 	['truck', 'van']
Training with Batch Size:  4
Number of Training Samples:  2211
Number of Validation Samples:  479
Number of Experiments:  200
Training with transfer learning from pretrained Model
Epoch 1/200
2021-02-03 12:54:36.834218: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:36.834259: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:39.872902: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-02-03 12:54:39.924676: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
4424/4424 [==============================] - 11204s 3s/step - loss: 27.8904 - yolo_layer_loss: 3.7100 - yolo_layer_1_loss: 6.9337 - yolo_layer_2_loss: 11.2279 - val_loss: 15.9281 - val_yolo_layer_loss: 3.5156 - val_yolo_layer_1_loss: 5.6680 - val_yolo_layer_2_loss: 6.3642
Epoch 2/200
4424/4424 [==============================] - 11003s 2s/step - loss: 16.0907 - yolo_layer_loss: 2.9831 - yolo_layer_1_loss: 5.5410 - yolo_layer_2_loss: 7.3031 - val_loss: 15.3965 - val_yolo_layer_loss: 3.6108 - val_yolo_layer_1_loss: 5.7432 - val_yolo_layer_2_loss: 5.9575
Epoch 3/200
4424/4424 [==============================] - 10891s 2s/step - loss: 15.9106 - yolo_layer_loss: 2.9518 - yolo_layer_1_loss: 5.6270 - yolo_layer_2_loss: 7.2520 - val_loss: 15.5433 - val_yolo_layer_loss: 3.7602 - val_yolo_layer_1_loss: 5.6196 - val_yolo_layer_2_loss: 6.1130
Epoch 4/200
4424/4424 [==============================] - 10932s 2s/step - loss: 15.5688 - yolo_layer_loss: 2.5437 - yolo_layer_1_loss: 5.4113 - yolo_layer_2_loss: 7.5648 - val_loss: 17.7629 - val_yolo_layer_loss: 4.7392 - val_yolo_layer_1_loss: 6.3278 - val_yolo_layer_2_loss: 6.6648
Epoch 5/200
4424/4424 [==============================] - 10872s 2s/step - loss: 15.4870 - yolo_layer_loss: 2.4382 - yolo_layer_1_loss: 5.4916 - yolo_layer_2_loss: 7.5235 - val_loss: 63.7160 - val_yolo_layer_loss: 5.7094 - val_yolo_layer_1_loss: 38.9786 - val_yolo_layer_2_loss: 18.9935
Epoch 6/200
4424/4424 [==============================] - 11067s 3s/step - loss: 15.5494 - yolo_layer_loss: 2.5183 - yolo_layer_1_loss: 5.5150 - yolo_layer_2_loss: 7.4831 - val_loss: 178.7223 - val_yolo_layer_loss: 3.8561 - val_yolo_layer_1_loss: 35.8665 - val_yolo_layer_2_loss: 138.9692
Epoch 7/200
4424/4424 [==============================] - 10608s 2s/step - loss: 15.5048 - yolo_layer_loss: 2.5163 - yolo_layer_1_loss: 5.5533 - yolo_layer_2_loss: 7.4078 - val_loss: 15.5903 - val_yolo_layer_loss: 3.8752 - val_yolo_layer_1_loss: 5.5330 - val_yolo_layer_2_loss: 6.1565

evaluation.py file and output below

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset")
metrics = trainer.evaluateModel(model_path="/data/dataset/models", json_path="/data/dataset/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold$

cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f f00b40aa6e65
2021-02-04 09:58:00.255022: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-04 09:58:00.255073: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-04 09:58:02.125390: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-04 09:58:02.125583: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-04 09:58:02.125600: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-04 09:58:02.125625: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (f00b40aa6e65): /proc/driver/nvidia/version does not exist
2021-02-04 09:58:02.125826: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-04 09:58:02.129119: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Starting Model evaluation....
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples  given at /data/dataset/train
Model File:  /data/dataset/models/detection_model-ex-001--loss-0012.120.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-001--loss-0019.993.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-002--loss-0010.218.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-002--loss-0015.954.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-003--loss-0009.916.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-003--loss-0015.790.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-004--loss-0009.739.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================

Feb 04 '21 10:02 welydev

The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.

Feb 08 '21 10:02 Overdoze47

The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.

I really don't think it's a loss problem. Even if it's not below 1, the AI should still recognize some elements. I'm getting the same issue here...

Apr 16 '21 14:04 Duboislo

My mAP scores are all 0 also. I'm trying to figure out why.

Jun 14 '21 12:06 SB2020-eye

Even at a loss of 12, that should still be a decent model. Here is a potential workaround if you are getting an mAP of 0.0 at the evaluation step (and have verified the h5 model and the training/evaluation data is not corrupted):

Flush the workspace cache with: !rm -r .../<name_of_workspace>/cache/ During the evaluation step, ImageAI will recreate this folder along with the .pkl files. Between model training and evaluation, sometimes the file becomes "corrupted", which will give a default mAP of 0.0. Flushing the cache after training resolved this issue every time it occurred for me.

Jan 20 '22 19:01 TheBeastCoding

I've also trained a YOLOv3 architecture for an object detection problem and get a mAP of 0.0. Can you be more specific and tell me what you mean by "flushing the cache" after training ? Do you mean deleting the cache? @TheBeastCoding

Mar 01 '22 20:03 Edwin-Aguirre92

When you run the training step, there will be a folder in your model workspace named cache. Before testing, delete the entire folder named cache. Testing will automatically recreate the cache file with its contents

Mar 02 '22 18:03 TheBeastCoding

I see, thanks for the answer @TheBeastCoding . However, when I tried to do that it did not work. What worked for me was to not use anything in the imageai in the conda environment space while architecture was training. I guess this makes this file corrupt somehow?

Mar 02 '22 20:03 Edwin-Aguirre92

ImageAI ImageAI copied to clipboard

Evaluating models outputs `mAP : 0.0000`

ImageAI
ImageAI copied to clipboard