Inconsistent result of coco metrics and eval side-by-side.

Open 3zhang opened this issue 10 months ago • 1 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://www.kaggle.com/datasets/marcosgabriel/photovoltaic-system-thermography

2. Describe the bug

I trained a mask-rcnn model (fine-tuned from mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8) based on the dataset above. The training was done successfully. However the evaluation result is confusing. The AP@IoU=0.50/0.75 are very high (~1). But based on the side-by-side image result, it should not be that high.

Evaluate annotation type bbox DONE (t=0.14s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.815 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.815 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.399 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.854 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.854 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 creating index... index created! creating index... index created! Running per image evaluation... Evaluate annotation type segm DONE (t=0.14s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.836 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.836 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.042 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.409 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.874 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.874 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000z

3. Steps to reproduce

eval config: eval_config: { metrics_set: "coco_detection_metrics" metrics_set: "coco_mask_metrics" eval_instance_masks: true use_moving_averages: false batch_size: 2 include_metrics_per_category: false }

eval_input_reader: { label_map_path: "/content/labelmap.txt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/content/tf_record/sample_val.record" } load_instance_masks: true mask_type: PNG_MASKS }

4. Expected behavior

By roughly manually calculation, the precision is 0.99382716049, the recall is 0.92528735632. I know this is not AP. But it clearly shows that AP should not be ~1.

5. Additional context

6. System information

Evaluation was done on google colab.

Apr 16 '24 09:04 3zhang

You need to add a max_number_of_boxes to the Eval_input_reader and max_num_boxes_to_visualize to the eval_config sections of you pipeline.config. Tensorflow has a default of 20 BB per image.

Also see: https://github.com/tensorflow/models/issues/11076

Jun 11 '24 16:06 Sam-Seaberry

models models copied to clipboard

Inconsistent result of coco metrics and eval side-by-side.

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

models
models copied to clipboard