[feature] implement the selenium-wire webdriver
https://github.com/wkeeling/selenium-wire
Looks awesome logs the http(s) requests.
I would love to have this feature implemented and I'm willing to create a PR if the maintainers accept the idea.
In many Web scraping projects I need to get information regarding browser requests and responses. Sometimes to avoid making these requests again (to save an image, for example -- the browser downloaded it already) and other times to inspect the URLs and headers. selenium-wire is a handy library to do it, but it's not directly available in splinter.
Current solution
My current solution requires monkey-patching splinter.driver.webdriver.firefox. After installing requirements with pip install splinter selenium-wire blinker==1.7.0, run:
import time
def start_browser():
from seleniumwire.webdriver import Firefox as FirefoxWireDriver
from splinter.browser import get_driver
from splinter.driver.webdriver import firefox as splinter_firefox
splinter_firefox.Firefox = FirefoxWireDriver
browser = get_driver(splinter_firefox.WebDriver)
return browser
browser = start_browser()
browser.visit("https://brasil.io/")
time.sleep(5)
print(len(browser.driver.requests)) # 48
browser.quit()
First proposal
If you're uncomfortable in supporting selenium-wire, I'd like to ask if it's possible to at least changing splinter.driver.webdriver.BaseWebDriver interface: if we add a class method get_driver_class I could implement a solution without monkey patching:
from splinter.driver.webdriver.firefox import WebDriver
class FirefoxWireDriver(WebDriver):
@classmethod
def get_driver_class(cls):
# This would be called by __init__ and passed to _setup_firefox()
from seleniumwire.webdriver import Firefox
return Firefox
def start_browser():
from splinter.browser import get_driver
browser = get_driver(FirefoxWireDriver)
return browser
browser = start_browser()
browser.visit("https://brasil.io/")
time.sleep(5)
print(len(browser.driver.requests)) # 48
browser.quit()
The code would be longer, but less hacky.
Another good improvement would be to add an official way to register new drivers, something like:
from splinter import Browser
from mymodule import FirefoxWireDriver
Browser.register("firefox-wire", FirefoxWireDriver)
browser = Browser("firefox-wire")
browser.visit(...)
Second proposal
The second (and ideal) proposal for me would be to add direct support to selenium-wire. The library could be an optional requirement and the implementation the same as above (creating the class FirefoxWireDriver in splinter.driver.webdriver.firefox_wire) plus adding
"firefox-wire" to splinter.browser._DRIVERS.
What do you think?
@turicas what about a monkeypatch of that ?
@turicas what about a monkeypatch of that ?
Sorry, I didn't understand. My current solution is already a monkey patch. It works, but I'm proposing adding official support so people won't need a monkey patch for that.
Any thoughts regarding my proposal?
I am interested in your proposal; where can I check the code?
by the way @turicas the Selenium wires is not maintained anymore ...
I am interested in your proposal; where can I check the code?
The only code I've written is in this comment: https://github.com/cobrateam/splinter/issues/730#issuecomment-2465988149 I was waiting for an "OK" regarding the proposals to actually start working on a PR. But since I needed this implemented for a specific scraping project, I used the monkey patch approach I commented.
by the way @turicas the Selenium wires is not maintained anymore ...
Yep, the repository was archived 1 year ago. But the code works anyway (and AFAIK there are no active forks). I used the code 3 months ago in a scraping project I've run daily for more than 2 months and everything worked fine. If you feel uncomfortable in supporting selenium-wire directly because of its maintenance status, consider my first proposal, which will change the Splinter API a little bit to make it easier to extend the current drivers. In this case, we could add docs regarding how to do it, with an example using selenium-wire.