search-engine-parser icon indicating copy to clipboard operation
search-engine-parser copied to clipboard

Bing search is broken

Open bentsi opened this issue 3 years ago • 2 comments

Describe the bug Running simple code (based on the Readme)

getting:

ENGINE FAILURE: Bing
Traceback (most recent call last):
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/bentsi/devel/continueai/backend/src/scraping/search_engine_query.py", line 17, in <module>
    bresults = bsearch.search(**search_args)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 288, in search
    return self.get_results(soup, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 247, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic

after digging into the root cause I found following:

  1. http request to Bing returns response with HTML without results image
  2. after adding a cookie that Google Chrome adds to GET headers, the code starts working image

So the solution is to add cookie data, but I am not sure what exactly should be added, since cookie looks sophisticated.

To Reproduce

from search_engine_parser.core.engines.bing import Search as BingSearch
company_name = "samsung electronics corp official website"

search_args = {"query": company_name, "page": 1}
bsearch = BingSearch()
bsearch.clear_cache()
bresults = bsearch.search(**search_args)

Expected behavior Search returns results Screenshots

Desktop (please complete the following information):

  • OS: Ubuntu 20.04
  • Python Version: 3.10.5
  • Search-engine-parser version: 0.6.6

bentsi avatar Jul 13 '22 18:07 bentsi

succeeded to find the correct cookie, but now getting results parsing issue:

Traceback (most recent call last):
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 252, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/engines/bing.py", line 68, in parse_single_result
    rdict["descriptions"] = desc.text
AttributeError: 'NoneType' object has no attribute 'text'

will work on a fix

bentsi avatar Jul 14 '22 08:07 bentsi

Thanks for the detailed investigation and working on a fix

deven96 avatar Jul 23 '22 07:07 deven96