scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

self.driver = driver_klass(**driver_kwargs) TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

Open ahmedraxa23 opened this issue 1 year ago • 17 comments

Chrome driver

ahmedraxa23 avatar Jun 20 '23 08:06 ahmedraxa23

I have the same issue:

Unhandled error in Deferred:

Traceback (most recent call last):
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 240, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 244, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
builtins.TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

I use chromedriver on windows.

EdgarGc026 avatar Jun 23 '23 16:06 EdgarGc026

Selenium 4 executable_path is depreciated and Service() is used instead. Install selenium 3 to solve.

pip install 'selenium<4'

oamer1 avatar Jun 24 '23 08:06 oamer1

Same problem?

Shah13079 avatar Jun 27 '23 10:06 Shah13079

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

ton77v avatar Jul 03 '23 10:07 ton77v

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

But How ton77v? Do you mean to use selenium integration in spider class ?

Shah13079 avatar Jul 03 '23 11:07 Shah13079

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

But How ton77v? Do you mean to use selenium integration in spider class ?

I mean something like this:

  1. https://github.com/clemfromspace/scrapy-selenium/fork
  2. Clone in your IDE and modify similarly like I did for myself for example https://github.com/clemfromspace/scrapy-selenium/commit/5c3fe7b43ab336349ef5fdafe39fc87f6a8a8c34
  3. Run the tests to make sure it works
  4. Pip uninstall scrapy-selenium
  5. Pip install git+{https://your_repository}

And you have your own scrapy-selenium fork that you may adjust further as you wish while preserving the original scrapy-selenium API

ton77v avatar Jul 03 '23 13:07 ton77v

i have also facing same issue, i am finding the solution for 5 days and there is not video on internet about this error, can you please make a short video on it, as you say above, thanks in advance, i'll wait for it. this will be a huge favor from your side.

naveedsid avatar Jul 03 '23 14:07 naveedsid

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

should we use selenium 3 for fork package?

naveedsid avatar Jul 03 '23 14:07 naveedsid

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file - SELENIUM_DRIVER_NAME = 'chrome' - SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out - SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

jg3wilso avatar Jul 03 '23 23:07 jg3wilso

thanks alot, and appreciated, just want to know little bit more about it, i want "undetected-chromedriver" to do the same thing, that selenium webdriver perform in middlewares.py, how will changes made? Note: undetected-chromedriver (UC) is python library or modified version of selenium, and UC can work with pre-installed chrome.exe, hence it don't want chromedriver for execution it can either work with preinstalled chrome profiles

naveedsid avatar Jul 04 '23 12:07 naveedsid

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

should we use selenium 3 for fork package?

It's possible but makes no sense. Will work just fine with the latest version

ton77v avatar Jul 05 '23 02:07 ton77v

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file

Traceback (most recent call last):
  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse
    img = response.meta['screenshot']
KeyError: 'screenshot'

jjerxawp avatar Aug 09 '23 03:08 jjerxawp

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file


Traceback (most recent call last):

  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks

    current.result = callback(  # type: ignore[misc]

  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse

    img = response.meta['screenshot']

KeyError: 'screenshot'

That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀

ton77v avatar Aug 09 '23 12:08 ton77v

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file


Traceback (most recent call last):

  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks

    current.result = callback(  # type: ignore[misc]

  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse

    img = response.meta['screenshot']

KeyError: 'screenshot'

That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀

In order to use all the browsers, I'd recommend creating a service object and passing that into the webdriver. An example of that implementation is https://github.com/clemfromspace/scrapy-selenium/pull/135/files.

NB: The service object takes up additional arguments like log_path and port that I did not consider in this alteration.

malmike avatar Oct 15 '23 06:10 malmike

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

I can't get it working, i can't find the selenium folder you mentioned

leovizeu avatar Nov 08 '23 13:11 leovizeu

@jg3wilso

@ahmedraxa23 please close the issue if this worked for you

I don't this issue should be closed; since it requires a workaround not a 'proper' solution that works out of the box.

J-Brk avatar Apr 12 '24 22:04 J-Brk

Hi. I've made a naive fix. https://github.com/clemfromspace/scrapy-selenium/issues/133#issuecomment-2078167476

jogobeny avatar Apr 26 '24 18:04 jogobeny