mmocr icon indicating copy to clipboard operation
mmocr copied to clipboard

Unclear TextDetDataset Annotation Format

Open humza909 opened this issue 2 years ago • 1 comments

Sample Annotation:

{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}

I am not clear if the box and segmentation are in the following way: box = [x1, y1, w, h] segmentation = [x1, y1, x1+w, y1, x1+w, y1+h, x1, y1+h] Please let me know if my understanding is correct, if it is then the values are not correct in the above sample annotation taken from mmocr dataset types documentation.

I have created a custom dataset annotation file, creating the box and segmentation using the above method and I ended up in the wrong coordinates at the output with the same toy data pipeline, whereas the toy data ground truths are correct when passed through the same pipeline.

humza909 avatar Sep 07 '22 20:09 humza909

Hi, if you ONLY have rectangular annotations, you can get the pseudo segmentation label by [x, y, x+w, y, x+w, y+h, x, y+h]. However, if you already have quadrilateral or polygonal bounding boxes, you can get the bbox label by [min_x, min_y, max_x-min_x, max_y-min_y]. In the sample annotations, the bbox was generated by the latter case, which is an minimum bounding rectangle of the segmentation.

For custom dataset, you may double check if your data format is correct. If you still have problems, please provide more details.

xinke-wang avatar Sep 08 '22 03:09 xinke-wang