voc2coco
voc2coco copied to clipboard
Generated image IDs are non-unique
Firstly, thanks for creating this script, it was a great help to me.
When I first ran it, it worked almost perfectly, but with one problem - the COCO format image IDs were all over the place, many non-unique (many 0s for example) which breaks the COCO format. I saw how you're generating them as a function of the filename, and given the image IDs have no VOC equivalent, I think it would make more sense to do a strict ordering per image.
I did a hacky solution for now, I'm leaving this issue open so I can come back to it later and open a PR with a fix. If anybody else is having this problem, look for img_id and you can try incrementing it manually for the moment.
@msnidal Thanks for sharing !
I will also check this problem. I would be grateful if you could share a way or data to reproduce this problem.
@msnidal can you please show us your hacky solution?
@davidhuangal , this code works on assumption that your file names are according to serial integer. like image1,image2,image3 or any_name1,any_name2... , so if you're having file which is like a_1.jpg,b_1.jpg then reges used in the code assigns the same id. so if you want to solve it then you can use this method:
img_id_dict={}
for filename is filename_list:
img_id_dict[filename.split(".")[0]]=len(img_id_dict)+1
replace
if extract_num_from_imgid and isinstance(img_id, str):
img_id = img_id_dict[img_id]
Yeah having the same issue. My images are named like (example):
480_0_36.png
480_0_37.png
...
499_0_5.png
499_0_6.png
And for each filename ("X_Y_Z.png") it assumes the id is always X.
Is there any solution for this? Does it affect when using 'annotation paths list'?
Yeah. I'm seeing the same here. My test image IDs are J073-xxxxxxxxxx. This fix works
95: for img_id, a_path in enumerate(tqdm(annotation_paths)):
102: img_info['id'] = img_id
I see that the issue is still open, which I encountered as well. I share a quite simple solution, which seems to do the job. Adding a simple count generates unique ids.
` count=0 def get_image_info(annotation_root, extract_num_from_imgid=True):
global count
path = annotation_root.findtext('path')
if path is None:
filename = annotation_root.findtext('filename')
else:
filename = os.path.basename(path)
img_name = os.path.basename(filename)
img_id = count
count+=1
# if extract_num_from_imgid and isinstance(img_id, str):
# img_id = int(re.findall(r'\d+', img_id)[0])
`
If you guys encounter another issue, let me know so we can take a look.
I have the same issue here. I fixed this issue in my forked repository.