mmocr
mmocr copied to clipboard
Unclear TextDetDataset Annotation Format
Sample Annotation:
{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
I am not clear if the box and segmentation are in the following way: box = [x1, y1, w, h] segmentation = [x1, y1, x1+w, y1, x1+w, y1+h, x1, y1+h] Please let me know if my understanding is correct, if it is then the values are not correct in the above sample annotation taken from mmocr dataset types documentation.
I have created a custom dataset annotation file, creating the box and segmentation using the above method and I ended up in the wrong coordinates at the output with the same toy data pipeline, whereas the toy data ground truths are correct when passed through the same pipeline.
Hi, if you ONLY have rectangular annotations, you can get the pseudo segmentation
label by [x, y, x+w, y, x+w, y+h, x, y+h]
. However, if you already have quadrilateral or polygonal bounding boxes, you can get the bbox
label by [min_x, min_y, max_x-min_x, max_y-min_y]
. In the sample annotations, the bbox
was generated by the latter case, which is an minimum bounding rectangle of the segmentation
.
For custom dataset, you may double check if your data format is correct. If you still have problems, please provide more details.