pylabel icon indicating copy to clipboard operation
pylabel copied to clipboard

Fix importing VOC dataset with incorrect filename properties. Fix copying images after export.

Open YaserAlOsh opened this issue 1 year ago • 2 comments

Greetings, When working with this package, I had two issues in importing a VOC dataset and exporting it to YoloV5 with images.

When importing a VOC dataset in pylabel, the filename property of each .xml annotation file is used to determine the name of the image name. That name is then used when exporting the dataset to other formats.
In case the name is incorrect or is empty, this process fails and when exporting we do not get the correct number of annotations back. For example, if all .xml files had the same 'filename' property, we only get one file after export.

I fixed it by looking through the images directory and looking for an image with the same name as the annotations file.

Another issue this fixes in exporter.py is when copying images if exporting to YoloV5. In the code, the annotation path is merged with the images path, which generates an incorrect path for the images. I fixed it by commenting the annotation path in the Path concatenation code.

I hope my changes will not break any functionality. Please let me know if there is a better way to solve my issue.

Thank you for making this package public. Kind regards, Yaser.

YaserAlOsh avatar Jul 15 '23 13:07 YaserAlOsh

Thank you @YaserAlOsh ! There are fill issues with your pull request.

  1. It failed the validation tests for some reason. You can learn about the tests here https://github.com/pylabel-project/pylabel/blob/dev/tests/README.md
  2. Your fix for importing is very similar to another pull request https://github.com/pylabel-project/pylabel/pull/118. Since that one that one passed the validation tests I have incorporated it into the latest version v52. Can you give it a try?
  3. I didn't understand this commit https://github.com/pylabel-project/pylabel/pull/122/commits/03b090336fafbad982ca8f9c2a8b7535a50c0e59. If the annotations are in a different folder than the images, than the pathtoannotations is needed

alexheat avatar Jul 16 '23 10:07 alexheat

Hello @alexheat, I apologize for my late reply.

I have just tried the latest version from GitHub (by reinstalling). It does seem to be importing everything correctly, but the values in 'img_filename' column do not contain the image extension.
When exporting with this code: dataset.export.ExportToYoloV5(f"{export_name}/labels/",copy_images=True,use_splits=True),

I get the error: FileNotFoundError: [Errno 2] No such file or directory: 'datasets\\dataset-main\\annotations\\datasets\\dataset-main\\images\\1232'

Which seems to be repeating the path and also has the wrong file name (no extension).

As for your third concern about the commit https://github.com/pylabel-project/pylabel/commit/03b090336fafbad982ca8f9c2a8b7535a50c0e59, I think I changed the code to account for this issue (duplicating the path), but I am not sure if I did something wrong that caused it in the first place.

YaserAlOsh avatar Jul 31 '23 11:07 YaserAlOsh