beginners-pytorch-deep-learning icon indicating copy to clipboard operation
beginners-pytorch-deep-learning copied to clipboard

Need an explanation of how failed image downloads are to be managed

Open pjgoodall opened this issue 5 years ago • 4 comments

I'm downloading images for Chapter 2, there are a number of explicit errors like:

Error downloading http://farm1.static.flickr.com/27/57455726_8ccf14753f.jpg
Error downloading http://farm1.static.flickr.com/185/426029368_3a9612f006.jpg
Error downloading http://farm2.static.flickr.com/1133/1141687717_890fe14d8e.jpg
Error downloading http://farm1.static.flickr.com/62/204035942_de0d323af5.jpg
Error downloading http://farm2.static.flickr.com/1172/1323667952_a7a74975c4.jpg
Error downloading http://farm1.static.flickr.com/42/84450056_bb5974a64f.jpg
Error downloading http://farm4.static.flickr.com/3001/2776498771_20527f258b.jpg
Error downloading http://farm4.static.flickr.com/3159/2824589595_07ee2443a3.jpg
Error downloading http://farm4.static.flickr.com/3206/2970698030_1021311f52.jpg
Error downloading http://farm2.static.flickr.com/1125/1348034256_cc50f5b446.jpg
Error downloading http://farm3.static.flickr.com/2416/1833444691_dc3d1017db.jpg
Error downloading http://farm3.static.flickr.com/2077/1834276056_4e86bacfe3.jpg
Error downloading http://farm2.static.flickr.com/1231/1455910975_07315bf34e.jpg
Error downloading http://farm2.static.flickr.com/1424/747119762_615603a9a0.jpg
Error downloading http://farm1.static.flickr.com/79/233477721_0126cc0331.jpg
Error downloading http://farm3.static.flickr.com/2056/2021607548_e0835d3552.jpg
Error downloading http://farm2.static.flickr.com/1312/1287462891_f1a3e27b50.jpg
Error downloading http://farm2.static.flickr.com/1322/1348044264_3e0ed6611e.jpg
Error downloading http://farm4.static.flickr.com/3183/2791133165_5df1d47be5.jpg
Error downloading http://farm4.static.flickr.com/3088/2776496087_1973f8dced.jpg
Error downloading http://farm4.static.flickr.com/3224/2777354896_176a518b8c.jpg
Error downloading http://farm2.static.flickr.com/1103/840200833_be72b99848.jpg

There are also invalid 0 byte jpeg files created in the target directories.

It would help students if there was a quick explanation in the README on github. helping us with:

  • should we worry or not?
  • actions to take to recover - if necessary

Excellent tutorial !

Cheers

-- Peter G

pjgoodall avatar Jan 20 '20 22:01 pjgoodall

Will try to get that sorted this weekend - in general for the project for the book, you don't need to worry about it too much. Invalid files should get skipped by the check added in the dataloader, but I'll point it out in the README!

falloutdurham avatar Jan 29 '20 01:01 falloutdurham

Invalid files do not get skipped. Also some of the images contain logo's from the host website so they are still technically valid images aren't cats or fish

kwehmeyer avatar Jun 30 '20 15:06 kwehmeyer

I had the same problems - 0 byte images and logos (despite images with broken url links). I managed it by deleting the affected ones manually, just as a quick and dirty solution. @falloutdurham Do you know whether the images in the zip file are clean? Maybe you could share the link in the README? Or we could think about extending the check_file() function for the ImageFolder objects (or adjusting the download.py), having no need for a continuous reassessment of the url links..

MarcusFra avatar Jul 27 '20 06:07 MarcusFra

@falloutdurham Again thinking about this issue with the zero byte images - this shouldn't be a problem as long we use the is_valid_file=check_image argument in the ImageFolder class, does it? What do you think about sharing the link to the zip file in the README?

MarcusFra avatar Oct 05 '20 16:10 MarcusFra