google-images-download
google-images-download copied to clipboard
Way to just get image URLs without downloading?
First off, thank you for this library - it's a hugely helpful for creating toy ML datasets.
Is there a way to download just a list of the image URLs without downloading the files right away? If not, exposing that ability would be really helpful for deferring download as well as farming out the downloading tasks to workers using a different process/on a different machine.
Thanks!
Hello, @ZaxR.
You can add parameter "no_download": True in attributes. For example:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {
"keywords": "Polar bears,baloons,Beaches",
"limit": 20,
"no_download": True
}
paths = response.download(arguments)
print(paths)
what the output types of this function? how to parse this?
@screamolic This function return complex structure:
tuple(
dict(
'name_photos': list('url', 'url', ...),
...
),
int
)
In order for get all urls in one list use next code:
from google_images_download import google_images_download #importing the library
response = google_images_download.googleimagesdownload() #class instantiation
arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"no_download": True} #creating list of arguments
paths = response.download(arguments) #passing the arguments to the function
images_paths = []
for k, v in paths[0].items():
images_paths += v
print(images_paths)
Hello,
I've tryed this but for some reason i'm not getting any url. Here's what showing up in my console:
` Item no.: 1 --> Item name = Star Wars V: Empire Strikes Back 1980 Evaluating... Getting URLs without downloading images...
Errors: 0
[] `
What could i be doing wrong? Cheers
@screamolic This function return complex structure:
tuple( dict( 'name_photos': list('url', 'url', ...), ... ), int )
In order for get all urls in one list use next code:
from google_images_download import google_images_download #importing the library response = google_images_download.googleimagesdownload() #class instantiation arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"no_download": True} #creating list of arguments paths = response.download(arguments) #passing the arguments to the function images_paths = [] for k, v in paths[0].items(): images_paths += v print(images_paths)
Hello, @pedroboga
This the repository not updated for a long time. You can try use a fork project: https://github.com/maxpanakov/google-images-download
Instruction for install fork via pip: https://stackoverflow.com/a/24811490
Hope this helps.
I also have a patch which fixes the url not found issue, which you can install by running:
git clone https://github.com/Joeclinton1/google-images-download.git cd google-images-download && python setup.py install
@eshikvtumane I'll sure take a look, thank you!
@Joeclinton1 Yeah, that worked pretty well. Thanks for pointing out your patch.