scrapy-selenium
scrapy-selenium copied to clipboard
navigator.webdriver avoidance.
Hi! First of all, thanks for your amazing job!
I need to scrape a site that "doesn't work" with scrapy-selenium
.
After a bit of headache, I've spotted the problem.
It basically understands that I'm using Selenium, or similar technology, and do it's best to prevent scraping (https://intellipaat.com/community/5490/can-a-website-detect-when-you-are-using-selenium-with-chromedriver).
The only way I've found to avoid this check is to do link this (code from https://www.nuomiphp.com/eplan/en/421778.html):
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.get('<site to scrape>')
This snippet solves the problem, but this doesn't work on Scrapy with the SeleniumRequest
.
How can I avoid the navigator.webdriver
with this trick using scrapy-selenium
? Is it possible in the first place?
Thanks a lot in advance!
EDIT:
Just to clarify a little bit more, I've also tried to add the --disable-blink-features=AutomationControlled
flag to SELENIUM_DRIVER_ARGUMENTS
with no luck.
Something like: SELENIUM_DRIVER_ARGUMENTS = ['--disable-blink-features=AutomationControlled']
.
I've run into this issue before. There's a couple of ways to handle this depending on if you're using Chrome or Firefox, but I found this article pretty helpful https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html
You can basically use driver.execute_script(...)
to evaluate JS code in those examples.