HOI-Matting icon indicating copy to clipboard operation
HOI-Matting copied to clipboard

Notes about the dataset

Open 99991 opened this issue 2 years ago • 0 comments

As the dataset is available upon request now, it might be a good idea to document any surprising things about it to ensure that results will be reproducible.

Training dataset (train_list.txt)

  • Image 165 is in CMYK color space (4 color channels). OpenCV will load it correctly, but when using Pillow, you have to call .convert("rgb") on the image first.
  • Image 66 is a duplicate of image 236.
  • Image 233 is a duplicate of image 291.
  • The alpha for image 235 has a different size than the image downloaded from the internet.

Test dataset (test_list.txt)

  • The images 8 and 33 have a different size than the image downloaded from the internet.
  • To get rid of the warning DecompressionBombWarning: Image size (100920000 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. when using Pillow to load test image 33, you can set Image.MAX_IMAGE_PIXELS = None.

I see three options for image size the issues:

  1. Resize the images (might be inaccurate)
  2. Crop the images (would have to find out numbers first)
  3. Skip images during training/testing. The trained model might be slightly less powerful, but it is probably not really noticeable.

I think that the third option is the easiest and therefore the best choice for reproducibility.

99991 avatar Jul 06 '22 11:07 99991