google-images-download Fixed issue with links not being found for new google response format

Fixed issue with links not being found for new google response format

Open FarisHijazi opened this issue 4 years ago • 9 comments

the new 2020 google images update changes where the image information is stored, I found that they're stored in a script in variable AF_initDataCallback

This implementation is backward compatible (using rg_meta), and if that doesn't work, then it will parse the new info.

This code was tested with both python3 and python2

Apr 08 '20 23:04 FarisHijazi

can you add bs4 into the requirements.txt?

Apr 09 '20 06:04 hackgoofer

Hi, I tried to use this PR locally and am getting errors when running. My environment:

beautifulsoup4==4.9.0
bs4==0.0.1
-e [email protected]:hardikvasa/google-images-download.git@8d60f981d48ee7b5fb46f9541d427f8e81481706#egg=google_images_download
selenium==3.141.0
soupsieve==2.0
urllib3==1.25.8

The command I am using to download images

googleimagesdownload --keywords "Phyllopertha horticola" --limit 10 --chromedriver '/usr/bin/chromedriver'

The exception raised when running

Item no.: 1 --> Item name = Phyllopertha horticola
Evaluating...
Starting Download...
Traceback (most recent call last):
  File "/home/justin/projects/fastai/homework1/env/bin/googleimagesdownload", line 11, in <module>
    load_entry_point('google-images-download', 'console_scripts', 'googleimagesdownload')()
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 1124, in main
    paths,errors = response.download(arguments)  #wrapping response in a variable just for consistency
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 934, in download
    paths, errors = self.download_executor(arguments)
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 1061, in download_executor
    items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments)    #get all image items and download images
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 753, in _get_all_items
    self._parse_AF_initDataCallback(page)
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 901, in _parse_AF_initDataCallback
    metas = get_metas(page)
  File "/home/justin/projects/fastai/homework1/google-images-download/google_images_download/google_images_download.py", line 858, in get_metas
    entry = entries[-1]
IndexError: list index out of range

Apr 13 '20 02:04 justin-fay

yup, verified this PR doesn't work.

Apr 25 '20 19:04 hackgoofer

can you add bs4 into the requirements.txt?

done

I didn't want to add it to the requirements as it is an optional requirement. the code should still run without errors without bs4

May 12 '20 02:05 FarisHijazi

Hi I also have a problem after installing bs4.

The command that I am running: googleimagesdownload --keywords "tree" --limit 10 --chromedriver /Users/reza/Downloads/chromedriver/chromedriver

The error that I got:

Item no.: 1 --> Item name = tree Evaluating... Starting Download... WARNING: _parse_AF_initDataCallback failed list index out of range Traceback (most recent call last): File "/Users/reza/projects/tmp-test/my_env/bin/googleimagesdownload", line 11, in load_entry_point('google-images-download==2.8.0', 'console_scripts', 'googleimagesdownload')() File "/Users/reza/projects/tmp-test/my_env/lib/python3.8/site-packages/google_images_download-2.8.0-py3.8.egg/google_images_download/google_images_download.py", line 1129, in main File "/Users/reza/projects/tmp-test/my_env/lib/python3.8/site-packages/google_images_download-2.8.0-py3.8.egg/google_images_download/google_images_download.py", line 939, in download File "/Users/reza/projects/tmp-test/my_env/lib/python3.8/site-packages/google_images_download-2.8.0-py3.8.egg/google_images_download/google_images_download.py", line 1066, in download_executor File "/Users/reza/projects/tmp-test/my_env/lib/python3.8/site-packages/google_images_download-2.8.0-py3.8.egg/google_images_download/google_images_download.py", line 765, in _get_all_items File "/Users/reza/projects/tmp-test/my_env/lib/python3.8/site-packages/google_images_download-2.8.0-py3.8.egg/google_images_download/google_images_download.py", line 722, in _get_next_item TypeError: 'NoneType' object is not an iterator

May 14 '20 12:05 ghost

This works for me. It is also the right solution to the problem. Might have a few bugs that will have to be sorted before it works for everyone but @FarisHijazi is right about AF_initDataCallback. I checked it and the required information certainly is there. So it just needs to be parsed for it.

May 19 '20 19:05 marian-code

Works, but it thinks every image is a GIF, even when it's not.

May 29 '20 23:05 cooperdk

At line 776, before:

if arguments['metadata']:

Insert this:

                imageURL = object['image_link']
                object['image_format'] = imageURL.split(".")[-1]

If you don't, your script will think that all images are in GIF format. This is an easy fix and it will only look at the original file extension.

May 30 '20 00:05 cooperdk

This is a duplicate of #298

Jun 27 '20 13:06 Joeclinton1

google-images-download google-images-download copied to clipboard

Fixed issue with links not being found for new google response format

google-images-download
google-images-download copied to clipboard