models
models copied to clipboard
Inconsistent result of coco metrics and eval side-by-side.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [x] I checked to make sure that this issue has not already been filed.
1. The entire URL of the file you are using
https://www.kaggle.com/datasets/marcosgabriel/photovoltaic-system-thermography
2. Describe the bug
I trained a mask-rcnn model (fine-tuned from mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8) based on the dataset above. The training was done successfully. However the evaluation result is confusing. The AP@IoU=0.50/0.75 are very high (~1). But based on the side-by-side image result, it should not be that high.
Evaluate annotation type bbox DONE (t=0.14s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.815 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.815 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.399 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.854 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.854 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 creating index... index created! creating index... index created! Running per image evaluation... Evaluate annotation type segm DONE (t=0.14s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.836 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.836 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.042 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.409 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.874 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.874 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000z
3. Steps to reproduce
eval config: eval_config: { metrics_set: "coco_detection_metrics" metrics_set: "coco_mask_metrics" eval_instance_masks: true use_moving_averages: false batch_size: 2 include_metrics_per_category: false }
eval_input_reader: { label_map_path: "/content/labelmap.txt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/content/tf_record/sample_val.record" } load_instance_masks: true mask_type: PNG_MASKS }
4. Expected behavior
By roughly manually calculation, the precision is 0.99382716049, the recall is 0.92528735632. I know this is not AP. But it clearly shows that AP should not be ~1.
5. Additional context
6. System information
Evaluation was done on google colab.
You need to add a max_number_of_boxes to the Eval_input_reader and max_num_boxes_to_visualize to the eval_config sections of you pipeline.config. Tensorflow has a default of 20 BB per image.
Also see: https://github.com/tensorflow/models/issues/11076