scrapy-selenium
scrapy-selenium copied to clipboard
self.driver = driver_klass(**driver_kwargs) TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'
Chrome driver
I have the same issue:
Unhandled error in Deferred:
Traceback (most recent call last):
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 240, in crawl
return self._crawl(crawler, *args, **kwargs)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 244, in _crawl
d = crawler.crawl(*args, **kwargs)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1947, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1857, in _cancellableInlineCallbacks
_inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
result = context.run(gen.send, result)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
self.engine = self._create_engine()
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
self.downloader: Downloader = downloader_cls(crawler)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
DownloaderMiddlewareManager.from_crawler(crawler)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
instance = objcls.from_crawler(crawler, *args, **kwargs)
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
middleware = cls(
File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
self.driver = driver_klass(**driver_kwargs)
builtins.TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'
I use chromedriver on windows.
Selenium 4 executable_path is depreciated and Service() is used instead. Install selenium 3 to solve.
pip install 'selenium<4'
Same problem?
It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3
It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3
But How ton77v? Do you mean to use selenium integration in spider class ?
It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3
But How ton77v? Do you mean to use selenium integration in spider class ?
I mean something like this:
- https://github.com/clemfromspace/scrapy-selenium/fork
- Clone in your IDE and modify similarly like I did for myself for example https://github.com/clemfromspace/scrapy-selenium/commit/5c3fe7b43ab336349ef5fdafe39fc87f6a8a8c34
- Run the tests to make sure it works
- Pip uninstall scrapy-selenium
- Pip install git+{https://your_repository}
And you have your own scrapy-selenium fork that you may adjust further as you wish while preserving the original scrapy-selenium API
i have also facing same issue, i am finding the solution for 5 days and there is not video on internet about this error, can you please make a short video on it, as you say above, thanks in advance, i'll wait for it. this will be a huge favor from your side.
It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3
should we use selenium 3 for fork package?
Credits @ton77v for the answer, I can help simplify his answer:
- go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
- replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users/
/AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py) - [optional]: I had to
!pip install webdriver-manager
as well - for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
-
SELENIUM_DRIVER_NAME = 'chrome'
-SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
-SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
- then enter
scrapy runspider <scraper_name>.py
in your terminal and enjoy!
Quick explanation of what's happening:
- you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
- the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
- You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)
If this worked for you, be sure to like this message and show @ton77v some love!
@ahmedraxa23 please close the issue if this worked for you
thanks alot, and appreciated, just want to know little bit more about it, i want "undetected-chromedriver" to do the same thing, that selenium webdriver perform in middlewares.py, how will changes made? Note: undetected-chromedriver (UC) is python library or modified version of selenium, and UC can work with pre-installed chrome.exe, hence it don't want chromedriver for execution it can either work with preinstalled chrome profiles
It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3
should we use selenium 3 for fork package?
It's possible but makes no sense. Will work just fine with the latest version
Credits @ton77v for the answer, I can help simplify his answer:
- go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
- replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
- [optional]: I had to
!pip install webdriver-manager
as well- for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
- then enter
scrapy runspider <scraper_name>.py
in your terminal and enjoy!Quick explanation of what's happening:
- you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
- the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
- You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)
If this worked for you, be sure to like this message and show @ton77v some love!
@ahmedraxa23 please close the issue if this worked for you
Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file
Traceback (most recent call last):
File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse
img = response.meta['screenshot']
KeyError: 'screenshot'
Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file
Traceback (most recent call last): File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks current.result = callback( # type: ignore[misc] File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse img = response.meta['screenshot'] KeyError: 'screenshot'
That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀
Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file
Traceback (most recent call last): File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks current.result = callback( # type: ignore[misc] File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse img = response.meta['screenshot'] KeyError: 'screenshot'
That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀
In order to use all the browsers, I'd recommend creating a service object and passing that into the webdriver. An example of that implementation is https://github.com/clemfromspace/scrapy-selenium/pull/135/files.
NB: The service object takes up additional arguments like log_path
and port
that I did not consider in this alteration.
Credits @ton77v for the answer, I can help simplify his answer:
- go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
- replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
- [optional]: I had to
!pip install webdriver-manager
as well- for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
- then enter
scrapy runspider <scraper_name>.py
in your terminal and enjoy!Quick explanation of what's happening:
- you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
- the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
- You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)
If this worked for you, be sure to like this message and show @ton77v some love!
@ahmedraxa23 please close the issue if this worked for you
I can't get it working, i can't find the selenium folder you mentioned
@jg3wilso
@ahmedraxa23 please close the issue if this worked for you
I don't this issue should be closed; since it requires a workaround not a 'proper' solution that works out of the box.
Hi. I've made a naive fix in: https://github.com/clemfromspace/scrapy-selenium/issues/133#issuecomment-2078167476.