ImageAI
ImageAI copied to clipboard
Evaluating models outputs `mAP : 0.0000`
Hello,
I'm trying to evaluate models I've made using transfer learning from a docker container. I actually launched it with 200 experiments by mistake. However I'm trying to evaluate the first models.
cloud@serveur-cic-tempo:~/imageaitest2$ ls -la dataset/models/
total 2412288
drwxr-xr-x 2 cloud cloud 4096 Feb 4 09:14 .
drwxr-xr-x 8 cloud cloud 4096 Feb 4 09:14 ..
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-001--loss-0012.120.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-001--loss-0019.993.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-002--loss-0010.218.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-002--loss-0015.954.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-003--loss-0009.916.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-003--loss-0015.790.h5
-rw-r--r-- 1 cloud cloud 247010224 Feb 4 09:14 detection_model-ex-004--loss-0009.739.h5
Could it be my issue here?
However, I believe the training didn't work since mAP is 0.0000. I don't really understand what could be my issue here. Any clue please?
Below, you can find my config, the steps I followed and the outputs.
Configuration :
cloud@serveur-cic-tempo:~/imageaitest2$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 42 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities
Docker Container from official Tensorflow image : https://hub.docker.com/r/tensorflow/tensorflow/
root@45ab5106701d:/# pip list
Package Version
---------------------- ---------
absl-py 0.11.0
asn1crypto 0.24.0
astunparse 1.6.3
cachetools 4.2.0
certifi 2020.12.5
chardet 3.0.4
cryptography 2.1.4
cycler 0.10.0
decorator 4.4.2
flatbuffers 1.12
gast 0.3.3
google-auth 1.24.0
google-auth-oauthlib 0.4.2
google-pasta 0.2.0
grpcio 1.32.0
h5py 2.10.0
idna 2.6
imageai 2.1.6
imageio 2.9.0
imgaug 0.4.0
importlib-metadata 3.3.0
Keras 2.4.3
Keras-Preprocessing 1.1.2
keras-resnet 0.2.0
keyring 10.6.0
keyrings.alt 3.0
kiwisolver 1.3.1
Markdown 3.3.3
matplotlib 3.3.2
networkx 2.5
numpy 1.19.3
oauthlib 3.1.0
opencv-python 4.5.1.48
opt-einsum 3.3.0
pandas 1.1.5
Pillow 7.0.0
pip 20.2.4
protobuf 3.14.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycrypto 2.6.1
pygobject 3.26.1
pyparsing 2.4.7
python-dateutil 2.8.1
pytz 2021.1
PyWavelets 1.1.1
pyxdg 0.25
PyYAML 5.4.1
requests 2.25.0
requests-oauthlib 1.3.0
rsa 4.6
scikit-image 0.17.2
scipy 1.4.1
SecretStorage 2.3.1
setuptools 51.0.0
Shapely 1.7.1
six 1.15.0
tensorboard 2.4.0
tensorboard-plugin-wit 1.7.0
tensorflow 2.4.0
tensorflow-estimator 2.4.0rc0
termcolor 1.1.0
tifffile 2020.9.3
typing-extensions 3.7.4.3
urllib3 1.26.2
Werkzeug 1.0.1
wheel 0.36.2
wrapt 1.12.1
zipp 3.4.0
training.py file and output below
from imageai.Detection.Custom import DetectionModelTrainer
trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset/")
trainer.setTrainConfig(object_names_array=["truck","van"], batch_size=4, num_experiments=200, train_from_pretrained_model="/data/pretrained-yolov3.h5")
trainer.trainModel()
cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f 44e073940c3f
2021-02-03 12:54:27.034598: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-03 12:54:27.034629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-03 12:54:30.259959: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-03 12:54:30.260180: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-03 12:54:30.260198: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-03 12:54:30.260224: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (44e073940c3f): /proc/driver/nvidia/version does not exist
2021-02-03 12:54:30.260431: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-03 12:54:30.262570: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
2021-02-03 12:54:33.302708: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:33.302747: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:33.302799: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
warnings.warn('`Model.fit_generator` is deprecated and '
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3504: UserWarning: Even though the tf.config.experimental_run_functions_eagerly option is set, this option does not apply to tf.data functions. tf.data functions are still traced and executed as graphs.
"Even though the tf.config.experimental_run_functions_eagerly "
WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... Layer YoloLayer has arguments in `__init__` and therefore must override `get_config`.
2021-02-03 12:54:33.434020: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-02-03 12:54:33.434470: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3000000000 Hz
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.83
Anchor Boxes generated.
Detection configuration saved in /data/dataset/json/detection_config.json
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples given at /data/dataset/train
Training on: ['truck', 'van']
Training with Batch Size: 4
Number of Training Samples: 2211
Number of Validation Samples: 479
Number of Experiments: 200
Training with transfer learning from pretrained Model
Epoch 1/200
2021-02-03 12:54:36.834218: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-02-03 12:54:36.834259: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-02-03 12:54:39.872902: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-02-03 12:54:39.924676: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
4424/4424 [==============================] - 11204s 3s/step - loss: 27.8904 - yolo_layer_loss: 3.7100 - yolo_layer_1_loss: 6.9337 - yolo_layer_2_loss: 11.2279 - val_loss: 15.9281 - val_yolo_layer_loss: 3.5156 - val_yolo_layer_1_loss: 5.6680 - val_yolo_layer_2_loss: 6.3642
Epoch 2/200
4424/4424 [==============================] - 11003s 2s/step - loss: 16.0907 - yolo_layer_loss: 2.9831 - yolo_layer_1_loss: 5.5410 - yolo_layer_2_loss: 7.3031 - val_loss: 15.3965 - val_yolo_layer_loss: 3.6108 - val_yolo_layer_1_loss: 5.7432 - val_yolo_layer_2_loss: 5.9575
Epoch 3/200
4424/4424 [==============================] - 10891s 2s/step - loss: 15.9106 - yolo_layer_loss: 2.9518 - yolo_layer_1_loss: 5.6270 - yolo_layer_2_loss: 7.2520 - val_loss: 15.5433 - val_yolo_layer_loss: 3.7602 - val_yolo_layer_1_loss: 5.6196 - val_yolo_layer_2_loss: 6.1130
Epoch 4/200
4424/4424 [==============================] - 10932s 2s/step - loss: 15.5688 - yolo_layer_loss: 2.5437 - yolo_layer_1_loss: 5.4113 - yolo_layer_2_loss: 7.5648 - val_loss: 17.7629 - val_yolo_layer_loss: 4.7392 - val_yolo_layer_1_loss: 6.3278 - val_yolo_layer_2_loss: 6.6648
Epoch 5/200
4424/4424 [==============================] - 10872s 2s/step - loss: 15.4870 - yolo_layer_loss: 2.4382 - yolo_layer_1_loss: 5.4916 - yolo_layer_2_loss: 7.5235 - val_loss: 63.7160 - val_yolo_layer_loss: 5.7094 - val_yolo_layer_1_loss: 38.9786 - val_yolo_layer_2_loss: 18.9935
Epoch 6/200
4424/4424 [==============================] - 11067s 3s/step - loss: 15.5494 - yolo_layer_loss: 2.5183 - yolo_layer_1_loss: 5.5150 - yolo_layer_2_loss: 7.4831 - val_loss: 178.7223 - val_yolo_layer_loss: 3.8561 - val_yolo_layer_1_loss: 35.8665 - val_yolo_layer_2_loss: 138.9692
Epoch 7/200
4424/4424 [==============================] - 10608s 2s/step - loss: 15.5048 - yolo_layer_loss: 2.5163 - yolo_layer_1_loss: 5.5533 - yolo_layer_2_loss: 7.4078 - val_loss: 15.5903 - val_yolo_layer_loss: 3.8752 - val_yolo_layer_1_loss: 5.5330 - val_yolo_layer_2_loss: 6.1565
evaluation.py file and output below
from imageai.Detection.Custom import DetectionModelTrainer
trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/data/dataset")
metrics = trainer.evaluateModel(model_path="/data/dataset/models", json_path="/data/dataset/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold$
cloud@serveur-cic-tempo:~/imageaitest2$ docker logs -f f00b40aa6e65
2021-02-04 09:58:00.255022: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-02-04 09:58:00.255073: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-04 09:58:02.125390: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-04 09:58:02.125583: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-02-04 09:58:02.125600: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-02-04 09:58:02.125625: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (f00b40aa6e65): /proc/driver/nvidia/version does not exist
2021-02-04 09:58:02.125826: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-04 09:58:02.129119: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Starting Model evaluation....
Evaluating over 479 samples taken from /data/dataset/validation
Training over 2211 samples given at /data/dataset/train
Model File: /data/dataset/models/detection_model-ex-001--loss-0012.120.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-001--loss-0019.993.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-002--loss-0010.218.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-002--loss-0015.954.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-003--loss-0009.916.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-003--loss-0015.790.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
Model File: /data/dataset/models/detection_model-ex-004--loss-0009.739.h5
Evaluation samples: 479
Using IoU: 0.5
Using Object Threshold: 0.3
Using Non-Maximum Suppression: 0.5
truck: 0.0000
van: 0.0000
mAP: 0.0000
===============================
The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.
The loss of your models is clearly too high. Your value is at loss-0012.120. The goal must be to land under 1.0.
I really don't think it's a loss problem. Even if it's not below 1, the AI should still recognize some elements. I'm getting the same issue here...
My mAP scores are all 0 also. I'm trying to figure out why.
Even at a loss of 12, that should still be a decent model. Here is a potential workaround if you are getting an mAP of 0.0 at the evaluation step (and have verified the h5 model and the training/evaluation data is not corrupted):
- Flush the workspace cache with: !rm -r .../<name_of_workspace>/cache/ During the evaluation step, ImageAI will recreate this folder along with the .pkl files. Between model training and evaluation, sometimes the file becomes "corrupted", which will give a default mAP of 0.0. Flushing the cache after training resolved this issue every time it occurred for me.
I've also trained a YOLOv3 architecture for an object detection problem and get a mAP of 0.0. Can you be more specific and tell me what you mean by "flushing the cache" after training ? Do you mean deleting the cache? @TheBeastCoding
When you run the training step, there will be a folder in your model workspace named cache. Before testing, delete the entire folder named cache. Testing will automatically recreate the cache file with its contents
I see, thanks for the answer @TheBeastCoding . However, when I tried to do that it did not work. What worked for me was to not use anything in the imageai in the conda environment space while architecture was training. I guess this makes this file corrupt somehow?