splinter icon indicating copy to clipboard operation
splinter copied to clipboard

[feature] implement the selenium-wire webdriver

Open zodman opened this issue 6 years ago • 6 comments

https://github.com/wkeeling/selenium-wire

Looks awesome logs the http(s) requests.

zodman avatar Nov 12 '19 18:11 zodman

I would love to have this feature implemented and I'm willing to create a PR if the maintainers accept the idea.

In many Web scraping projects I need to get information regarding browser requests and responses. Sometimes to avoid making these requests again (to save an image, for example -- the browser downloaded it already) and other times to inspect the URLs and headers. selenium-wire is a handy library to do it, but it's not directly available in splinter.

Current solution

My current solution requires monkey-patching splinter.driver.webdriver.firefox. After installing requirements with pip install splinter selenium-wire blinker==1.7.0, run:

import time

def start_browser():
    from seleniumwire.webdriver import Firefox as FirefoxWireDriver
    from splinter.browser import get_driver
    from splinter.driver.webdriver import firefox as splinter_firefox

    splinter_firefox.Firefox = FirefoxWireDriver
    browser = get_driver(splinter_firefox.WebDriver)
    return browser

browser = start_browser()
browser.visit("https://brasil.io/")
time.sleep(5)
print(len(browser.driver.requests))  # 48
browser.quit()

First proposal

If you're uncomfortable in supporting selenium-wire, I'd like to ask if it's possible to at least changing splinter.driver.webdriver.BaseWebDriver interface: if we add a class method get_driver_class I could implement a solution without monkey patching:

from splinter.driver.webdriver.firefox import WebDriver

class FirefoxWireDriver(WebDriver):
    @classmethod
    def get_driver_class(cls):
        # This would be called by __init__ and passed to _setup_firefox()
        from seleniumwire.webdriver import Firefox
        return Firefox

def start_browser():
    from splinter.browser import get_driver

    browser = get_driver(FirefoxWireDriver)
    return browser

browser = start_browser()
browser.visit("https://brasil.io/")
time.sleep(5)
print(len(browser.driver.requests))  # 48
browser.quit()

The code would be longer, but less hacky.

Another good improvement would be to add an official way to register new drivers, something like:

from splinter import Browser
from mymodule import FirefoxWireDriver
Browser.register("firefox-wire", FirefoxWireDriver)

browser = Browser("firefox-wire")
browser.visit(...)

Second proposal

The second (and ideal) proposal for me would be to add direct support to selenium-wire. The library could be an optional requirement and the implementation the same as above (creating the class FirefoxWireDriver in splinter.driver.webdriver.firefox_wire) plus adding "firefox-wire" to splinter.browser._DRIVERS.

What do you think?

turicas avatar Nov 09 '24 02:11 turicas

@turicas what about a monkeypatch of that ?

zodman avatar Nov 22 '24 02:11 zodman

@turicas what about a monkeypatch of that ?

Sorry, I didn't understand. My current solution is already a monkey patch. It works, but I'm proposing adding official support so people won't need a monkey patch for that.

turicas avatar Nov 22 '24 16:11 turicas

Any thoughts regarding my proposal?

turicas avatar Jan 29 '25 22:01 turicas

I am interested in your proposal; where can I check the code?

by the way @turicas the Selenium wires is not maintained anymore ...

zodman avatar Jan 31 '25 16:01 zodman

I am interested in your proposal; where can I check the code?

The only code I've written is in this comment: https://github.com/cobrateam/splinter/issues/730#issuecomment-2465988149 I was waiting for an "OK" regarding the proposals to actually start working on a PR. But since I needed this implemented for a specific scraping project, I used the monkey patch approach I commented.

by the way @turicas the Selenium wires is not maintained anymore ...

Yep, the repository was archived 1 year ago. But the code works anyway (and AFAIK there are no active forks). I used the code 3 months ago in a scraping project I've run daily for more than 2 months and everything worked fine. If you feel uncomfortable in supporting selenium-wire directly because of its maintenance status, consider my first proposal, which will change the Splinter API a little bit to make it easier to extend the current drivers. In this case, we could add docs regarding how to do it, with an example using selenium-wire.

turicas avatar Feb 01 '25 02:02 turicas