search-engine-parser icon indicating copy to clipboard operation
search-engine-parser copied to clipboard

google cannot parse "Tallest mountain in the world"

Open fanzhuyifan opened this issue 3 years ago • 7 comments

Description The google engine cannot parse the return results of "Tallest mountain in the world"

To Reproduce Steps to reproduce the behavior:

from search_engine_parser.core.engines.google import Search
searcher = Search()
results = searcher.search("Tallest mountain in the world")

Expected behavior Correctly parsed results

Screenshots

Traceback (most recent call last):
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 240, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/engines/google.py", line 74, in parse_single_result
    title = r_elem.find('div', class_='BNeawe').text
AttributeError: 'NoneType' object has no attribute 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "XXXXX/temp.py", line 4, in <module>
    results = searcher.search("Tallest mountain in the world")
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 270, in search
    return self.get_results(soup, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 243, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The returned results could not be parsed. This might be due to site updates or server errors. Drop an issue at https://github.com/bisoncorps/search-engine-parser if this persists

Desktop (please complete the following information):

  • OS: [Linux]
  • Python Version [3.9.5]
  • Search-engine-parser version [0.6.2] (latest)

Additional context The result that cannot be parsed:

<div class="ZINbbc xpd O9g5cc uUPGi"><div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&amp;usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div><div class="CgE3Ac I9mEQ"><table class="LnMnt"><thead><tr><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Rank</div></div></td><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Mountain</div></div></td><td class="IxZjcf sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Country</div></div></td></tr></thead><tbody><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">1.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Everest</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">2.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">K2 (Mount Godwin Austen)</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Pakistan/China</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">3.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Kangchenjunga</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">India/Nepal</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">4.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Lhotse</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr></tbody></table></div><div class="hwc"><div class="Q0HXG"></div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&amp;usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com &gt; world &gt; geography &gt; top-ten-worlds-highest-mount...</div></span></div></a></div></div></div></div>

The corresponding result of https://github.com/bisoncorps/search-engine-parser/blob/0418867b3529980d5a4eb71899dec37092fe7df1/search_engine_parser/core/engines/google.py#L66

[<div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&amp;usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div>,
 <div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&amp;usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com &gt; world &gt; geography &gt; top-ten-worlds-highest-mount...</div></span></div></a></div>]

The first div does not contain the title.

fanzhuyifan avatar Jul 04 '21 03:07 fanzhuyifan

Are you running this on heroku?

MeNsaaH avatar Sep 20 '21 15:09 MeNsaaH

I have version 0.6.6 installed and I get the same error. And I am not running on heroku.

KennBro avatar Dec 20 '21 03:12 KennBro

Same error on various search queries

GuyKh avatar Feb 08 '22 08:02 GuyKh

Is this on Heroku?

MeNsaaH avatar Feb 08 '22 10:02 MeNsaaH

I am getting the same error on various search queries. I also tried running this locally and not on Heroku, but it is still not working.

icc-sundar avatar Feb 13 '22 00:02 icc-sundar

I am also receiving the same exceptions for all but a few of the simplest single-word search terms.

Specs

  • OS: Windows 10 Pro
    • Version: 21H2
    • Build: 19044.1645
  • Parser Version: 0.6.6

Other

  • Not running Heroku

GigglePocket avatar Apr 22 '22 00:04 GigglePocket

#168 should fix it

bentsi avatar Jul 13 '22 18:07 bentsi