SerpScrap icon indicating copy to clipboard operation
SerpScrap copied to clipboard

mobile SERP

Open cristiano74 opened this issue 6 years ago • 1 comments

Hi @ecoron , thanks a lot for your script and I'm wondering if I can scrape the SERP as mobile device. I've seen the user_agent.py and I've switched the computer and mobile user agent. I guess it's not a good idea, as the warning below:

2018-07-24 10:25:30,725 - scrapcore.scraper.selenium - ERROR - Skip it, no such element - SeleniumSearchError
Exception in thread [google]SelScrape:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 600, in wait_until_serp_loaded
    str(self.page_number)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 
Screenshot: available via screen


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 606, in wait_until_serp_loaded
    self.webdriver.find_element_by_css_selector(selector).text
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 589, in find_element_by_css_selector
    return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 955, in find_element
    'value': value})['value']
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with css selector '#navcnt td.cur'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"105","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:58099","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"css selector\", \"value\": \"#navcnt td.cur\", \"sessionId\": \"d1d12330-8f2b-11e8-8bc9-69459efa1ced\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d1d12330-8f2b-11e8-8bc9-69459efa1ced/element"}}
Screenshot: available via screen


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 761, in run
    self.search()
  File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 701, in search
    self.wait_until_serp_loaded()
  File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 610, in wait_until_serp_loaded
    raise SeleniumSearchError('Stop Scraping, seems we are blocked')
scrapcore.scraping.SeleniumSearchError: Stop Scraping, seems we are blocked

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 11, in write
    w = csv.DictWriter(f, my_dict[0].keys(), dialect='excel')
IndexError: list index out of range
None
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 11, in write
    w = csv.DictWriter(f, my_dict[0].keys(), dialect='excel')
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/serpscraper1.py", line 21, in <module>
    results = scrap.as_csv('/tmp/output')
  File "/usr/local/lib/python3.6/dist-packages/serpscrap/serpscrap.py", line 134, in as_csv
    writer.write(file_path + '.csv', self.results)
  File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 17, in write
    raise Exception
Exception

Is there any other method to apply to mobile SERP as well? Any suggestions would be really appreciated. C

cristiano74 avatar Jul 24 '18 10:07 cristiano74

i would recommend to use the chrome headless instead, since selenium/phantomjs is detected very fast. i'm not tested yet if its possible to get the mobile results by default

but maybe it's possible (http://chromedriver.chromium.org/mobile-emulation) need some time to evaluate

ecoron avatar Jul 27 '18 12:07 ecoron