SerpScrap
SerpScrap copied to clipboard
mobile SERP
Hi @ecoron , thanks a lot for your script and I'm wondering if I can scrape the SERP as mobile device. I've seen the user_agent.py and I've switched the computer and mobile user agent. I guess it's not a good idea, as the warning below:
2018-07-24 10:25:30,725 - scrapcore.scraper.selenium - ERROR - Skip it, no such element - SeleniumSearchError
Exception in thread [google]SelScrape:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 600, in wait_until_serp_loaded
str(self.page_number)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Screenshot: available via screen
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 606, in wait_until_serp_loaded
self.webdriver.find_element_by_css_selector(selector).text
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 589, in find_element_by_css_selector
return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 955, in find_element
'value': value})['value']
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with css selector '#navcnt td.cur'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"105","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:58099","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"css selector\", \"value\": \"#navcnt td.cur\", \"sessionId\": \"d1d12330-8f2b-11e8-8bc9-69459efa1ced\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d1d12330-8f2b-11e8-8bc9-69459efa1ced/element"}}
Screenshot: available via screen
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 761, in run
self.search()
File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 701, in search
self.wait_until_serp_loaded()
File "/usr/local/lib/python3.6/dist-packages/scrapcore/scraper/selenium.py", line 610, in wait_until_serp_loaded
raise SeleniumSearchError('Stop Scraping, seems we are blocked')
scrapcore.scraping.SeleniumSearchError: Stop Scraping, seems we are blocked
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 11, in write
w = csv.DictWriter(f, my_dict[0].keys(), dialect='excel')
IndexError: list index out of range
None
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 11, in write
w = csv.DictWriter(f, my_dict[0].keys(), dialect='excel')
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/serpscraper1.py", line 21, in <module>
results = scrap.as_csv('/tmp/output')
File "/usr/local/lib/python3.6/dist-packages/serpscrap/serpscrap.py", line 134, in as_csv
writer.write(file_path + '.csv', self.results)
File "/usr/local/lib/python3.6/dist-packages/serpscrap/csv_writer.py", line 17, in write
raise Exception
Exception
Is there any other method to apply to mobile SERP as well? Any suggestions would be really appreciated. C
i would recommend to use the chrome headless instead, since selenium/phantomjs is detected very fast. i'm not tested yet if its possible to get the mobile results by default
but maybe it's possible (http://chromedriver.chromium.org/mobile-emulation) need some time to evaluate