google-images-download
google-images-download copied to clipboard
Fixed issue with links not being found for new google response format
the new 2020 google images update changes where the image information is stored, I found that they're stored in a script in variable AF_initDataCallback
This implementation is backward compatible (using rg_meta), and if that doesn't work, then it will parse the new info.
This code was tested with both python3 and python2
can you add bs4 into the requirements.txt?
Hi, I tried to use this PR locally and am getting errors when running. My environment:
beautifulsoup4==4.9.0
bs4==0.0.1
-e [email protected]:hardikvasa/google-images-download.git@8d60f981d48ee7b5fb46f9541d427f8e81481706#egg=google_images_download
selenium==3.141.0
soupsieve==2.0
urllib3==1.25.8
The command I am using to download images
googleimagesdownload --keywords "Phyllopertha horticola" --limit 10 --chromedriver '/usr/bin/chromedriver'
The exception raised when running
Item no.: 1 --> Item name = Phyllopertha horticola
Evaluating...
Starting Download...
Traceback (most recent call last):
File "/home/justin/projects/fastai/homework1/env/bin/googleimagesdownload", line 11, in <module>
load_entry_point('google-images-download', 'console_scripts', 'googleimagesdownload')()
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 1124, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 934, in download
paths, errors = self.download_executor(arguments)
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 1061, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 753, in _get_all_items
self._parse_AF_initDataCallback(page)
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 901, in _parse_AF_initDataCallback
metas = get_metas(page)
File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 858, in get_metas
entry = entries[-1]
IndexError: list index out of range
yup, verified this PR doesn't work.
can you add bs4 into the requirements.txt?
done
I didn't want to add it to the requirements as it is an optional requirement. the code should still run without errors without bs4
Hi I also have a problem after installing bs4.
The command that I am running: googleimagesdownload --keywords "tree" --limit 10 --chromedriver /Users/reza/Downloads/chromedriver/chromedriver
The error that I got:
Item no.: 1 --> Item name = tree
Evaluating...
Starting Download...
WARNING: _parse_AF_initDataCallback failed list index out of range
Traceback (most recent call last):
File "/Users/reza/projects/tmp-test/my_env/bin/googleimagesdownload", line 11, in
This works for me. It is also the right solution to the problem. Might have a few bugs that will have to be sorted before it works for everyone but @FarisHijazi is right about AF_initDataCallback. I checked it and the required information certainly is there. So it just needs to be parsed for it.
Works, but it thinks every image is a GIF, even when it's not.
At line 776, before:
if arguments['metadata']:
Insert this:
imageURL = object['image_link']
object['image_format'] = imageURL.split(".")[-1]
If you don't, your script will think that all images are in GIF format. This is an easy fix and it will only look at the original file extension.
This is a duplicate of #298