google-images-download icon indicating copy to clipboard operation
google-images-download copied to clipboard

Way to just get image URLs without downloading?

Open ZaxR opened this issue 4 years ago • 7 comments

First off, thank you for this library - it's a hugely helpful for creating toy ML datasets.

Is there a way to download just a list of the image URLs without downloading the files right away? If not, exposing that ability would be really helpful for deferring download as well as farming out the downloading tasks to workers using a different process/on a different machine.

Thanks!

ZaxR avatar Oct 28 '19 22:10 ZaxR

Hello, @ZaxR.

You can add parameter "no_download": True in attributes. For example:

from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
arguments = {
    "keywords": "Polar bears,baloons,Beaches",
    "limit": 20,
    "no_download": True
}
paths = response.download(arguments)
print(paths)

eshikvtumane avatar Nov 04 '19 11:11 eshikvtumane

what the output types of this function? how to parse this?

screamolic avatar Nov 21 '19 03:11 screamolic

@screamolic This function return complex structure:

tuple(
    dict(
        'name_photos': list('url', 'url', ...),
        ...
    ),
    int
)

In order for get all urls in one list use next code:

from google_images_download import google_images_download   #importing the library

response = google_images_download.googleimagesdownload()   #class instantiation

arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"no_download": True}   #creating list of arguments
paths = response.download(arguments)   #passing the arguments to the function

images_paths = []
for k, v in paths[0].items():
    images_paths += v

print(images_paths)

eshikvtumane avatar Nov 23 '19 06:11 eshikvtumane

Hello,

I've tryed this but for some reason i'm not getting any url. Here's what showing up in my console:

` Item no.: 1 --> Item name = Star Wars V: Empire Strikes Back 1980 Evaluating... Getting URLs without downloading images...

Errors: 0

[] `

What could i be doing wrong? Cheers

@screamolic This function return complex structure:

tuple(
    dict(
        'name_photos': list('url', 'url', ...),
        ...
    ),
    int
)

In order for get all urls in one list use next code:

from google_images_download import google_images_download   #importing the library

response = google_images_download.googleimagesdownload()   #class instantiation

arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"no_download": True}   #creating list of arguments
paths = response.download(arguments)   #passing the arguments to the function

images_paths = []
for k, v in paths[0].items():
    images_paths += v

print(images_paths)

pedroboga avatar Nov 03 '20 00:11 pedroboga

Hello, @pedroboga

This the repository not updated for a long time. You can try use a fork project: https://github.com/maxpanakov/google-images-download

Instruction for install fork via pip: https://stackoverflow.com/a/24811490

Hope this helps.

eshikvtumane avatar Nov 04 '20 01:11 eshikvtumane

I also have a patch which fixes the url not found issue, which you can install by running:

git clone https://github.com/Joeclinton1/google-images-download.git cd google-images-download && python setup.py install

Joeclinton1 avatar Nov 04 '20 01:11 Joeclinton1

@eshikvtumane I'll sure take a look, thank you!

@Joeclinton1 Yeah, that worked pretty well. Thanks for pointing out your patch.

pedroboga avatar Nov 04 '20 16:11 pedroboga