ImageAI icon indicating copy to clipboard operation
ImageAI copied to clipboard

Evaluating models outputs `mAP : 0.0000`

Open welydev opened this issue 4 years ago • 7 comments

Hello,

I'm trying to evaluate models I've made using transfer learning from a docker container. I actually launched it with 200 experiments by mistake. However I'm trying to evaluate the first models.

cloud@serveur-cic-tempo:~/imageaitest2$ ls -la dataset/models/
total 2412288
drwxr-xr-x 2 cloud cloud      4096 Feb  4 09:14 .
drwxr-xr-x 8 cloud cloud      4096 Feb  4 09:14 ..
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-001--loss-0012.120.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-001--loss-0019.993.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-002--loss-0010.218.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-002--loss-0015.954.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-003--loss-0009.916.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-003--loss-0015.790.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb  4 09:14 detection_model-ex-004--loss-0009.739.h5

Could it be my issue here?

However, I believe the training didn't work since mAP is 0.0000. I don't really understand what could be my issue here. Any clue please?

Below, you can find my config, the steps I followed and the outputs.

Configuration :

cloud@serveur-cic-tempo:~/imageaitest2$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       42 bits physical, 48 bits virtual
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping:            7
CPU MHz:             3000.000
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            30976K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities

Docker Container from official Tensorflow image : https://hub.docker.com/r/tensorflow/tensorflow/

root@45ab5106701d:/# pip list
Package                Version
---------------------- ---------
absl-py                0.11.0
asn1crypto             0.24.0
astunparse             1.6.3
cachetools             4.2.0
certifi                2020.12.5
chardet                3.0.4
cryptography           2.1.4
cycler                 0.10.0
decorator              4.4.2
flatbuffers            1.12
gast                   0.3.3
google-auth            1.24.0
google-auth-oauthlib   0.4.2
google-pasta           0.2.0
grpcio                 1.32.0
h5py                   2.10.0
idna                   2.6
imageai                2.1.6
imageio                2.9.0
imgaug                 0.4.0
importlib-metadata     3.3.0
Keras                  2.4.3
Keras-Preprocessing    1.1.2
keras-resnet           0.2.0
keyring                10.6.0
keyrings.alt           3.0
kiwisolver             1.3.1
Markdown               3.3.3
matplotlib             3.3.2
networkx               2.5
numpy                  1.19.3
oauthlib               3.1.0
opencv-python          4.5.1.48
opt-einsum             3.3.0
pandas                 1.1.5
Pillow                 7.0.0
pip                    20.2.4
protobuf               3.14.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pycrypto               2.6.1
pygobject              3.26.1
pyparsing              2.4.7
python-dateutil        2.8.1
pytz                   2021.1
PyWavelets             1.1.1
pyxdg                  0.25
PyYAML                 5.4.1
requests               2.25.0
requests-oauthlib      1.3.0
rsa                    4.6
scikit-image           0.17.2
scipy                  1.4.1
SecretStorage          2.3.1
setuptools             51.0.0
Shapely                1.7.1
six                    1.15.0
tensorboard            2.4.0
tensorboard-plugin-wit 1.7.0
tensorflow             2.4.0
tensorflow-estimator   2.4.0rc0
termcolor              1.1.0
tifffile               2020.9.3
typing-extensions      3.7.4.3
urllib3                1.26.2
Werkzeug               1.0.1
wheel                  0.36.2
wrapt                  1.12.1
zipp                   3.4.0

training.py file and output below

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset/")
trainer.setTrainConfig(object_names_array=["truck","van"], batch_size=4, num_experiments=200, train_from_pretrained_model="/data/pretrained-yolov3.h5")
trainer.trainModel()
cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f 44e073940c3f
2021-02-03 12:54:27.034598: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-03 12:54:27.034629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-03 12:54:30.259959: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-03 12:54:30.260180: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-03 12:54:30.260198: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-03 12:54:30.260224: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (44e073940c3f): /proc/driver/nvidia/version does not exist
2021-02-03 12:54:30.260431: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-03 12:54:30.262570: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
2021-02-03 12:54:33.302708: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:33.302747: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:33.302799: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
  warnings.warn('`Model.fit_generator` is deprecated and '
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3504: UserWarning: Even though the tf.config.experimental_run_functions_eagerly option is set, this option does not apply to tf.data functions. tf.data functions are still traced and executed as graphs.
  "Even though the tf.config.experimental_run_functions_eagerly "
WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... Layer YoloLayer has arguments in `__init__` and therefore must override `get_config`.
2021-02-03 12:54:33.434020: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-02-03 12:54:33.434470: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3000000000 Hz
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.83
Anchor Boxes generated.
Detection configuration saved in  /data/dataset/json/detection_config.json
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples  given at /data/dataset/train
Training on: 	['truck', 'van']
Training with Batch Size:  4
Number of Training Samples:  2211
Number of Validation Samples:  479
Number of Experiments:  200
Training with transfer learning from pretrained Model
Epoch 1/200
2021-02-03 12:54:36.834218: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:36.834259: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:39.872902: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-02-03 12:54:39.924676: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
4424/4424 [==============================] - 11204s 3s/step - loss: 27.8904 - yolo_layer_loss: 3.7100 - yolo_layer_1_loss: 6.9337 - yolo_layer_2_loss: 11.2279 - val_loss: 15.9281 - val_yolo_layer_loss: 3.5156 - val_yolo_layer_1_loss: 5.6680 - val_yolo_layer_2_loss: 6.3642
Epoch 2/200
4424/4424 [==============================] - 11003s 2s/step - loss: 16.0907 - yolo_layer_loss: 2.9831 - yolo_layer_1_loss: 5.5410 - yolo_layer_2_loss: 7.3031 - val_loss: 15.3965 - val_yolo_layer_loss: 3.6108 - val_yolo_layer_1_loss: 5.7432 - val_yolo_layer_2_loss: 5.9575
Epoch 3/200
4424/4424 [==============================] - 10891s 2s/step - loss: 15.9106 - yolo_layer_loss: 2.9518 - yolo_layer_1_loss: 5.6270 - yolo_layer_2_loss: 7.2520 - val_loss: 15.5433 - val_yolo_layer_loss: 3.7602 - val_yolo_layer_1_loss: 5.6196 - val_yolo_layer_2_loss: 6.1130
Epoch 4/200
4424/4424 [==============================] - 10932s 2s/step - loss: 15.5688 - yolo_layer_loss: 2.5437 - yolo_layer_1_loss: 5.4113 - yolo_layer_2_loss: 7.5648 - val_loss: 17.7629 - val_yolo_layer_loss: 4.7392 - val_yolo_layer_1_loss: 6.3278 - val_yolo_layer_2_loss: 6.6648
Epoch 5/200
4424/4424 [==============================] - 10872s 2s/step - loss: 15.4870 - yolo_layer_loss: 2.4382 - yolo_layer_1_loss: 5.4916 - yolo_layer_2_loss: 7.5235 - val_loss: 63.7160 - val_yolo_layer_loss: 5.7094 - val_yolo_layer_1_loss: 38.9786 - val_yolo_layer_2_loss: 18.9935
Epoch 6/200
4424/4424 [==============================] - 11067s 3s/step - loss: 15.5494 - yolo_layer_loss: 2.5183 - yolo_layer_1_loss: 5.5150 - yolo_layer_2_loss: 7.4831 - val_loss: 178.7223 - val_yolo_layer_loss: 3.8561 - val_yolo_layer_1_loss: 35.8665 - val_yolo_layer_2_loss: 138.9692
Epoch 7/200
4424/4424 [==============================] - 10608s 2s/step - loss: 15.5048 - yolo_layer_loss: 2.5163 - yolo_layer_1_loss: 5.5533 - yolo_layer_2_loss: 7.4078 - val_loss: 15.5903 - val_yolo_layer_loss: 3.8752 - val_yolo_layer_1_loss: 5.5330 - val_yolo_layer_2_loss: 6.1565

evaluation.py file and output below

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset")
metrics = trainer.evaluateModel(model_path="/data/dataset/models", json_path="/data/dataset/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold$
cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f f00b40aa6e65
2021-02-04 09:58:00.255022: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-04 09:58:00.255073: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-04 09:58:02.125390: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-04 09:58:02.125583: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-04 09:58:02.125600: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-04 09:58:02.125625: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (f00b40aa6e65): /proc/driver/nvidia/version does not exist
2021-02-04 09:58:02.125826: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-04 09:58:02.129119: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Starting Model evaluation....
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples  given at /data/dataset/train
Model File:  /data/dataset/models/detection_model-ex-001--loss-0012.120.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-001--loss-0019.993.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-002--loss-0010.218.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-002--loss-0015.954.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-003--loss-0009.916.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-003--loss-0015.790.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File:  /data/dataset/models/detection_model-ex-004--loss-0009.739.h5 

Evaluation samples:  479
Using IoU:  0.5
Using Object Threshold:  0.3
Using Non-Maximum Suppression:  0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================


welydev avatar Feb 04 '21 10:02 welydev

The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.

Overdoze47 avatar Feb 08 '21 10:02 Overdoze47

The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.

I really don't think it's a loss problem. Even if it's not below 1, the AI should still recognize some elements. I'm getting the same issue here...

Duboislo avatar Apr 16 '21 14:04 Duboislo

My mAP scores are all 0 also. I'm trying to figure out why.

SB2020-eye avatar Jun 14 '21 12:06 SB2020-eye

Even at a loss of 12, that should still be a decent model. Here is a potential workaround if you are getting an mAP of 0.0 at the evaluation step (and have verified the h5 model and the training/evaluation data is not corrupted):

  • Flush the workspace cache with: !rm -r .../<name_of_workspace>/cache/ During the evaluation step, ImageAI will recreate this folder along with the .pkl files. Between model training and evaluation, sometimes the file becomes "corrupted", which will give a default mAP of 0.0. Flushing the cache after training resolved this issue every time it occurred for me.

TheBeastCoding avatar Jan 20 '22 19:01 TheBeastCoding

I've also trained a YOLOv3 architecture for an object detection problem and get a mAP of 0.0. Can you be more specific and tell me what you mean by "flushing the cache" after training ? Do you mean deleting the cache? @TheBeastCoding

Edwin-Aguirre92 avatar Mar 01 '22 20:03 Edwin-Aguirre92

When you run the training step, there will be a folder in your model workspace named cache. Before testing, delete the entire folder named cache. Testing will automatically recreate the cache file with its contents

TheBeastCoding avatar Mar 02 '22 18:03 TheBeastCoding

I see, thanks for the answer @TheBeastCoding . However, when I tried to do that it did not work. What worked for me was to not use anything in the imageai in the conda environment space while architecture was training. I guess this makes this file corrupt somehow?

Edwin-Aguirre92 avatar Mar 02 '22 20:03 Edwin-Aguirre92