scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

navigator.webdriver avoidance.

Open Anandir opened this issue 3 years ago • 1 comments

Hi! First of all, thanks for your amazing job! I need to scrape a site that "doesn't work" with scrapy-selenium. After a bit of headache, I've spotted the problem. It basically understands that I'm using Selenium, or similar technology, and do it's best to prevent scraping (https://intellipaat.com/community/5490/can-a-website-detect-when-you-are-using-selenium-with-chromedriver).

The only way I've found to avoid this check is to do link this (code from https://www.nuomiphp.com/eplan/en/421778.html):

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.get('<site to scrape>')

This snippet solves the problem, but this doesn't work on Scrapy with the SeleniumRequest.

How can I avoid the navigator.webdriver with this trick using scrapy-selenium? Is it possible in the first place?

Thanks a lot in advance!

EDIT: Just to clarify a little bit more, I've also tried to add the --disable-blink-features=AutomationControlled flag to SELENIUM_DRIVER_ARGUMENTS with no luck. Something like: SELENIUM_DRIVER_ARGUMENTS = ['--disable-blink-features=AutomationControlled'].

Anandir avatar Nov 17 '20 23:11 Anandir

I've run into this issue before. There's a couple of ways to handle this depending on if you're using Chrome or Firefox, but I found this article pretty helpful https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html

You can basically use driver.execute_script(...) to evaluate JS code in those examples.

Flushot avatar Apr 23 '21 03:04 Flushot