voc2coco icon indicating copy to clipboard operation
voc2coco copied to clipboard

Generated image IDs are non-unique

Open msnidal opened this issue 4 years ago • 8 comments

Firstly, thanks for creating this script, it was a great help to me.

When I first ran it, it worked almost perfectly, but with one problem - the COCO format image IDs were all over the place, many non-unique (many 0s for example) which breaks the COCO format. I saw how you're generating them as a function of the filename, and given the image IDs have no VOC equivalent, I think it would make more sense to do a strict ordering per image.

I did a hacky solution for now, I'm leaving this issue open so I can come back to it later and open a PR with a fix. If anybody else is having this problem, look for img_id and you can try incrementing it manually for the moment.

msnidal avatar Mar 04 '20 19:03 msnidal

@msnidal Thanks for sharing !

I will also check this problem. I would be grateful if you could share a way or data to reproduce this problem.

yukkyo avatar Mar 07 '20 07:03 yukkyo

@msnidal can you please show us your hacky solution?

davidhuangal avatar Apr 15 '20 07:04 davidhuangal

@davidhuangal , this code works on assumption that your file names are according to serial integer. like image1,image2,image3 or any_name1,any_name2... , so if you're having file which is like a_1.jpg,b_1.jpg then reges used in the code assigns the same id. so if you want to solve it then you can use this method:

img_id_dict={}
for filename is filename_list:
    img_id_dict[filename.split(".")[0]]=len(img_id_dict)+1

replace

    if extract_num_from_imgid and isinstance(img_id, str):
        img_id = img_id_dict[img_id]

amitkumar-delhivery avatar Jun 05 '20 10:06 amitkumar-delhivery

Yeah having the same issue. My images are named like (example):

480_0_36.png
480_0_37.png
...
499_0_5.png
499_0_6.png

And for each filename ("X_Y_Z.png") it assumes the id is always X.

dinis-rodrigues avatar Jun 30 '20 00:06 dinis-rodrigues

Is there any solution for this? Does it affect when using 'annotation paths list'?

AntonioNuAc avatar Jul 03 '20 11:07 AntonioNuAc

Yeah. I'm seeing the same here. My test image IDs are J073-xxxxxxxxxx. This fix works

95: for img_id, a_path in enumerate(tqdm(annotation_paths)): 102: img_info['id'] = img_id

SubramanianKrish avatar Aug 07 '20 23:08 SubramanianKrish

I see that the issue is still open, which I encountered as well. I share a quite simple solution, which seems to do the job. Adding a simple count generates unique ids.

` count=0 def get_image_info(annotation_root, extract_num_from_imgid=True):

global count
path = annotation_root.findtext('path')
if path is None:
    filename = annotation_root.findtext('filename')
else:
    filename = os.path.basename(path)
img_name = os.path.basename(filename)
img_id = count
count+=1

# if extract_num_from_imgid and isinstance(img_id, str):
#     img_id = int(re.findall(r'\d+', img_id)[0])

`

If you guys encounter another issue, let me know so we can take a look.

karen-gishyan avatar Aug 13 '20 19:08 karen-gishyan

I have the same issue here. I fixed this issue in my forked repository.

XudongWang97 avatar Sep 03 '20 08:09 XudongWang97