tencent-ml-images icon indicating copy to clipboard operation
tencent-ml-images copied to clipboard

Download shell get more invalid urls

Open AmberCheng opened this issue 6 years ago • 4 comments

Hi,

I am downloading the train datasets thses days. As for its a big data, I divided all urls into 34 parts. So every part may contains 20w images. Then I used your shell to download every part. But a strange thing happened, the number of invalid urls add the number of images is more than 20w. I checked it in one part, the invalid urls contain some image is downloaded successfully. I wonder have you met this situation?

AmberCheng avatar Nov 30 '18 06:11 AmberCheng

@AmberCheng I guess, when the url is valid, it also saves an image showing "not available".

wubaoyuan avatar Nov 30 '18 08:11 wubaoyuan

@wubaoyuan I have just check it. The image is actual a image, not "not avaliable",but it is a broken one. I wonder why you don't package them, for downloading them has much more trouble......

AmberCheng avatar Nov 30 '18 08:11 AmberCheng

@AmberCheng Personally I would like to share the images. But, there is copyright risk for our company.

wubaoyuan avatar Nov 30 '18 09:11 wubaoyuan

@AmberCheng Please follow the suggestion that, downloading all images of ImageNet, then using the list we provide to extract the images used in our ML-Images. The URLs from Open Images are valid.

wubaoyuan avatar Nov 30 '18 10:11 wubaoyuan