search-engine-parser
search-engine-parser copied to clipboard
google cannot parse "Tallest mountain in the world"
Description The google engine cannot parse the return results of "Tallest mountain in the world"
To Reproduce Steps to reproduce the behavior:
from search_engine_parser.core.engines.google import Search
searcher = Search()
results = searcher.search("Tallest mountain in the world")
Expected behavior Correctly parsed results
Screenshots
Traceback (most recent call last):
File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 240, in get_results
search_results = self.parse_result(results, **kwargs)
File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
rdict = self.parse_single_result(each, **kwargs)
File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/engines/google.py", line 74, in parse_single_result
title = r_elem.find('div', class_='BNeawe').text
AttributeError: 'NoneType' object has no attribute 'text'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "XXXXX/temp.py", line 4, in <module>
results = searcher.search("Tallest mountain in the world")
File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 270, in search
return self.get_results(soup, **kwargs)
File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 243, in get_results
raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The returned results could not be parsed. This might be due to site updates or server errors. Drop an issue at https://github.com/bisoncorps/search-engine-parser if this persists
Desktop (please complete the following information):
- OS: [Linux]
- Python Version [3.9.5]
- Search-engine-parser version [0.6.2] (latest)
Additional context The result that cannot be parsed:
<div class="ZINbbc xpd O9g5cc uUPGi"><div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&sa=U&ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div><div class="CgE3Ac I9mEQ"><table class="LnMnt"><thead><tr><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Rank</div></div></td><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Mountain</div></div></td><td class="IxZjcf sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Country</div></div></td></tr></thead><tbody><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">1.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Everest</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">2.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">K2 (Mount Godwin Austen)</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Pakistan/China</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">3.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Kangchenjunga</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">India/Nepal</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">4.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Lhotse</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr></tbody></table></div><div class="hwc"><div class="Q0HXG"></div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&sa=U&ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com > world > geography > top-ten-worlds-highest-mount...</div></span></div></a></div></div></div></div>
The corresponding result of https://github.com/bisoncorps/search-engine-parser/blob/0418867b3529980d5a4eb71899dec37092fe7df1/search_engine_parser/core/engines/google.py#L66
[<div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&sa=U&ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div>,
<div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&sa=U&ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com > world > geography > top-ten-worlds-highest-mount...</div></span></div></a></div>]
The first div
does not contain the title.
Are you running this on heroku?
I have version 0.6.6 installed and I get the same error. And I am not running on heroku.
Same error on various search queries
Is this on Heroku?
I am getting the same error on various search queries. I also tried running this locally and not on Heroku, but it is still not working.
I am also receiving the same exceptions for all but a few of the simplest single-word search terms.
Specs
- OS: Windows 10 Pro
- Version: 21H2
- Build: 19044.1645
- Parser Version: 0.6.6
Other
- Not running Heroku
#168 should fix it