scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

Open GANGHSUN opened this issue 10 months ago • 6 comments

Ubuntu 22.04.3 LTS (Jammy Jellyfish) ARM64 Selenium 4.10.0 scrapy-selenium 0.0.7 Mozilla Firefox 115.0.2 geckodriver 0.33.0 ( 2023-07-11)

Configured as description, get error TypeError: WebDriver.init() got an unexpected keyword argument 'executable_path'

My spider.py

spider.py

import scrapy from quotes_js_scraper.items import QuoteItem from scrapy_selenium import SeleniumRequest from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

class QuotesSpider(scrapy.Spider): name = 'quotes'

def start_requests(self):
    url = 'https://quotes.toscrape.com/js/'

    yield SeleniumRequest(url=url, callback=self.parse, 
        wait_time=10,
        wait_until=EC.element_to_be_clickable((By.CLASS_NAME, 'quote'))
        )

def parse(self, response):
    quote_item = QuoteItem()
    for quote in response.css('div.quote'):
        quote_item['text'] = quote.css('span.text::text').get()
        quote_item['author'] = quote.css('small.author::text').get()
        quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
        yield quote_item

GANGHSUN avatar Aug 30 '23 01:08 GANGHSUN

same issue...

danyanyam avatar Sep 06 '23 12:09 danyanyam

I encountered the same issue and spent several hours devising a solution. Here's what I did to make it work:

  1. Install Python 3.8.0: I performed a custom installation on Windows, without adding to PATH, unchecking pip and all other options to avoid interfering with my current Python setup.

  2. Create a Virtual Environment (venv): I used the command python venv venv in the terminal. In the newly created 'venv' folder, there's a pyvenv.cfg file that needs to be modified with the following paths:

    home = C:\Users\User\AppData\Local\Programs\Python\Python38-32
    include-system-site-packages = false
    version = 3.8.0
    executable = C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe
    command =C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe -m venv C:\Users\User\Desktop\scraping\venv
    

    Make sure to set your own paths.

  3. Activate the venv: I used the command .\venv\Scripts\activate.

  4. Check Python Version: I ran python --version to ensure it was using Python 3.8.0.

  5. Install Necessary Packages: I installed pip, scrapy, scrapy_selenium, selenium (version 3.141.0), and urllib3 (version 1.25.11) using the following commands:

    python -m ensurepip
    python -m pip install scrapy
    python -m pip install scrapy_selenium
    pip install selenium==3.141.0
    pip install urllib3==1.25.11
    
  6. Download and Set Up Geckodriver: I downloaded geckodriver, created a folder for it, and added it to the environment variable PATH.

  7. Modify settings.py: I added the following lines to my settings.py file:

    from shutil import which
    SELENIUM_DRIVER_NAME = 'firefox'
    SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
    SELENIUM_DRIVER_ARGUMENTS=['-headless']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_selenium.SeleniumMiddleware': 800
    }
    

After these steps, I was able to successfully execute my Scrapy spider. I hope this can help someone.

Sercalod avatar Oct 22 '23 23:10 Sercalod

I encountered the same issue and spent several hours devising a solution. Here's what I did to make it work:

  1. Install Python 3.8.0: I performed a custom installation on Windows, without adding to PATH, unchecking pip and all other options to avoid interfering with my current Python setup.

  2. Create a Virtual Environment (venv): I used the command python venv venv in the terminal. In the newly created 'venv' folder, there's a pyvenv.cfg file that needs to be modified with the following paths:

    home = C:\Users\User\AppData\Local\Programs\Python\Python38-32
    include-system-site-packages = false
    version = 3.8.0
    executable = C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe
    command =C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe -m venv C:\Users\User\Desktop\scraping\venv
    

    Make sure to set your own paths.

  3. Activate the venv: I used the command .\venv\Scripts\activate.

  4. Check Python Version: I ran python --version to ensure it was using Python 3.8.0.

  5. Install Necessary Packages: I installed pip, scrapy, scrapy_selenium, selenium (version 3.141.0), and urllib3 (version 1.25.11) using the following commands:

    python -m ensurepip
    python -m pip install scrapy
    python -m pip install scrapy_selenium
    pip install selenium==3.141.0
    pip install urllib3==1.25.11
    
  6. Download and Set Up Geckodriver: I downloaded geckodriver, created a folder for it, and added it to the environment variable PATH.

  7. Modify settings.py: I added the following lines to my settings.py file:

    from shutil import which
    SELENIUM_DRIVER_NAME = 'firefox'
    SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
    SELENIUM_DRIVER_ARGUMENTS=['-headless']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_selenium.SeleniumMiddleware': 800
    }
    

After these steps, I was able to successfully execute my Scrapy spider. I hope this can help someone.

Works from me.

mileo avatar Oct 31 '23 01:10 mileo

Because of the new Selenium version the executable_path has been deprecated, and now it should pass in a Service object. My solution was change the init in file opt/conda/lib/python3.11/site-packages/scrapy_selenium/middlewares.py

def __init__(self, driver_name, driver_executable_path, driver_arguments,
        browser_executable_path):
        """Initialize the selenium webdriver

        Parameters
        ----------
        driver_name: str
            The selenium ``WebDriver`` to use
        driver_executable_path: str
            The path of the executable binary of the driver
        driver_arguments: list
            A list of arguments to initialize the driver
        browser_executable_path: str
            The path of the executable binary of the browser
        """
        webdriver_base_path = f'selenium.webdriver.{driver_name}'
        
        driver_klass_module = import_module('selenium.webdriver') 
        driver_klass = getattr(driver_klass_module, driver_name.capitalize())
        
        driver_service_module = import_module(f'{webdriver_base_path}.service')
        driver_service_klass = getattr(driver_service_module, 'Service')
        
        driver_options_module = import_module('selenium.webdriver')
        driver_options_klass = getattr(driver_options_module, driver_name.capitalize()+'Options')
        
        driver_options = driver_options_klass()
        if browser_executable_path:
            driver_options.binary_location = browser_executable_path
        for argument in driver_arguments:
            driver_options.add_argument(argument)
        
        service_kwargs = {
            'executable_path': driver_executable_path,
        }
        service = driver_service_klass(**service_kwargs)
        
        driver_kwargs = {
            'options': driver_options,
            'service': service
        }

        self.driver = driver_klass(**driver_kwargs)

turkievicz avatar Feb 02 '24 14:02 turkievicz

What @turkievicz said and this happened in v4.10.0 of Selenium, so one alternative is to just downgrade Selenium specifically to 4.9.1 and it should work again.

maaherra86 avatar Feb 14 '24 19:02 maaherra86

Hi. I've made a naive fix in my fork. https://github.com/jogobeny/scrapy-selenium. You may use it as:

pip install git+https://github.com/jogobeny/scrapy-selenium

jogobeny avatar Apr 25 '24 20:04 jogobeny