selenium-wire icon indicating copy to clipboard operation
selenium-wire copied to clipboard

Remote Webdriver with Auth Proxy connects but not using Proxy IP

Open mirisr opened this issue 2 years ago • 22 comments

I'm running a docker image locally, and using selenium-wire to connect to an authenticated proxy. This, of course, works perfectly when I do NOT use a remote WebDriver. I've looked at all the other issues, and I've not been able to solve my problem. This is what i have for my selenium-wire options:

options = {
            'auto_config': False,
            'addr': '0.0.0.0',
            'port': 8087,
            'proxy': {
                'http': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]), 
                'https': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]),
                'no_proxy': 'localhost,127.0.0.1' # excludes
            }

I set the port to 8087 just because that's what other "GitHub issues" said to do.

And since I'm running docker locally, I also added this to my Firefox options: options.add_argument(f"--proxy-server=host.docker.internal:8087")

when I create my driver:

 driver = webdriver.Remote(
                command_executor=selenium_connection,  # for prod
                #command_executor="http://localhost:4444/wd/hub", # for local development
                desired_capabilities=DesiredCapabilities.FIREFOX,
                options=options,
                keep_alive=True,
                browser_profile=firefox_profile, 
                seleniumwire_options=selenium_options
            )

It, the driver, successfully runs and the browser shows up. I get these logging message

25-May-22 15:30:02 PM UTC | INFO | Using default request storage
25-May-22 15:30:03 PM UTC | INFO | Created proxy listening on 0.0.0.0:8087
25-May-22 15:30:07 PM UTC | INFO | SUCCESS: created firefox using connection

I didn't add the first or second lines of logging info. I believe that comes from the selenium-wire driver code. But it's not using the proxy I manually added. Instead it's using the "addr" and "port" from the selenium driver. When I look in the browser, It doesn't have it setup either.

wkeeling Do you think you can help me?

mirisr avatar May 25 '22 15:05 mirisr

You are running selenium grid as docker or is it running on host machine where selenium-wire script docker running?

sanjeevtrz avatar May 30 '22 13:05 sanjeevtrz

Selenium grid is running remotely on another machine. The script is in a docker image is built locally and running

mirisr avatar May 30 '22 13:05 mirisr

It looks like your grid is not able to reach selenium-wire.

can you login into machine(where selenium grid hosted) and ping ip of selenium wire running in docker?

sanjeevtrz avatar May 30 '22 13:05 sanjeevtrz

So my friend was the one who set up the selenium grid. I just took a look at it. He has it running on Google Cloud Run. The URL is https://account-refresh-selenium-n3vdi73cpa-uc.a.run.app/ It looks like it is running in a docker container. It seems to me that it should to be able to ping where my selenium-wire is running from. Since selenium grid is running in a container on Cloud Run, I'm not sure how to manually run pings.

I put my docker image running selenium-wire in Google Compute Engine. (I mentioned this in another git issue) It has the external IP of: 35.206.XX.XXX

Now it's not even trying to use an authenticated proxy. I look at the network settings in Firefox (in the selenium grid) and it has "Use system proxy settings" checked instead of "Manual proxy configuration" (and it doesn't fill in my authenticated proxy information).

selenium-wire proxy settings

selenium_options = {
    'auto_config': False,
    'proxy': {
        'http': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]), 
        'https': 'https://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]),
        'no_proxy': 'localhost,127.0.0.1' # excludes
    },
    'addr': '35.206.XX.XXX',  #external IP address (The compute engine is: 35.206.XX.XXX)
    'port': 4444,
   
}

Q1. Is addr suppose to be where selenium-wire is running?

Firefox options

options.add_argument("--proxy-server={}".format('35.206.XX.XXX:4444')) 

Q2. This (above) is suppose to telling the selenium grid machine where selenium-wire is running, right?

selenium_connection = RemoteConnectionV2( remote_driver_host, keep_alive=False )
selenium_connection.set_remote_connection_authentication_headers()

driver = webdriver.Remote(
                command_executor=selenium_connection,  # for prod
                desired_capabilities=DesiredCapabilities.FIREFOX,
                options=options,
                keep_alive=True,
                browser_profile=firefox_profile, 
                seleniumwire_options=selenium_options
                # keep_alive=False,
            )

It's absolutely baffling why I can't get this to work. If You can help me figure this out via zoom. I'd appreciate that soooo much!

Also when I ping the compute engine from my machine, it pings successfully!

mirisr avatar May 30 '22 15:05 mirisr

Hello @mirisr

addr option is correct.

And you did everything correctly in terms of sending correct parameters.

Can you access selenium-wire, from anywhere? Or are you using any security group that can access.

Does the docker map the port number with host machine. i.e. using -p 8087:8087 when running docker command at selenium wire.?

sanjeevtrz avatar May 30 '22 16:05 sanjeevtrz

That may be my issue. I just assumed if I could ping the ip address that it would be enough. I will respond once I can get the port number mapped on Google's Compute Engine. If it's possible.

mirisr avatar May 31 '22 13:05 mirisr

@sanjeevtrz

According to this: Publishing container ports

"Container ports have a one-to-one mapping to the host VM ports. For example, a container port 80 maps to the host VM port 80. Compute Engine does not support the port publishing (-p) flag, and you do not have to specify it for the mapping to work."

"To publish a container's ports, configure firewall rules to enable access to the host VM instance's ports. The corresponding ports of the container are accessible automatically, according to the firewall rules."

So I created a new VM instance that allows for http/https traffic and created a firewall rule that now allows inward traffic through port 8080.

vm_ip = os.environ.get("VM_IP")
vm_port = int(os.environ.get("VM_PORT"))
# selenium-wire proxy settings
selenium_options = {
    'auto_config': False,
    'proxy': {
        'http': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]), 
        'https': 'https://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]),
        'no_proxy': 'localhost,127.0.0.1' # excludes
    },
    'addr': vm_ip,
    'port': vm_port
}
firefox_options.add_argument(f"--proxy-server={vm_ip}:{vm_port}")
driver = webdriver.Remote(
                command_executor=selenium_connection,  
                desired_capabilities=DesiredCapabilities.FIREFOX,
                options=firefox_options,
                keep_alive=True,
                browser_profile=firefox_profile, 
                seleniumwire_options=selenium_options
            )

According to my logs: I do see the right stuff being shown:

Selenium Options: {'auto_config': False, 'proxy': {'http': 'http://<myusername:password>@<proxyip>:4444', 'https': 'https://<myusername:password>@<proxyip>:4444', 'no_proxy': 'localhost,127.0.0.1'}, 'addr': '35.20X.XXX.XX', 'port': 8080}

But then I get this:

Failed to initiate Firefox browser: Error starting proxy server: gaierror(-9, 'Address family for hostname not supported')

So it's still not working.

mirisr avatar Jun 01 '22 13:06 mirisr

If I change 'addr' to 0.0.0.0

selenium_options = {
    'auto_config': False,
    'proxy': {
        'http': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]), 
        'https': 'https://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]),
        'no_proxy': 'localhost,127.0.0.1' # excludes
    },
    'addr': '0.0.0.0',
    'port': vm_port
}

and still have the Firefox options as

firefox_options.add_argument(f"--proxy-server={vm_ip}:{vm_port}")

The logs say:

Showing Firefox options: --proxy-server=35.20X.XXX.XX:8080
Created proxy listening on 0.0.0.0:8080

@wkeeling if this is right, which I'm currently inclined to believe is. Why isn't my authenticated proxy being used? And it runs the driver successfully, but it does not use my authenticated proxy.

mirisr avatar Jun 01 '22 14:06 mirisr

@wkeeling Do you know why I'm getting 'Address family for hostname not supported' when my external ip address is open to the network (I tested it using an online ping that pings from different places) and it's ipv4.

The external ip address of my vm-instance running selenium wire is in the correct format: 35.20X.XXX.XX

mirisr avatar Jun 01 '22 17:06 mirisr

@mirisr are you seeing a traceback with that error message?

wkeeling avatar Jun 02 '22 07:06 wkeeling

@wkeeling I don’t have it with me right now, but off the top of my head, I traced it backed to server.py in selenium-wire/third party/server I believe. In the init where it grabs addr in options.

mirisr avatar Jun 02 '22 08:06 mirisr

@wkeeling

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 41, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 624, in __init__
    self.socket.bind(self.address)
socket.gaierror: [Errno -9] Address family for hostname not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/libs/main/session_refresher.py", line 45, in get_new_sessions
    amazon_session = AmazonSession(messaging_account, country)
  File "/usr/local/bin/libs/auth/amazon_session_v2.py", line 86, in __init__
    self.driver = BrowserBot().setup_firefox(self._email)
  File "/usr/local/bin/libs/browser/browser_bot.py", line 78, in setup_firefox
    driver = webdriver.Remote(
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/webdriver.py", line 272, in __init__
    config = self._setup_backend(seleniumwire_options)
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/webdriver.py", line 40, in _setup_backend
    self.backend = backend.create(
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/backend.py", line 24, in create
    backend = MitmProxy(addr, port, options)
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/server.py", line 61, in __init__
    self.master.server = ProxyServer(ProxyConfig(mitmproxy_opts))
  File "/usr/local/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 49, in __init__
    raise exceptions.ServerException(
seleniumwire.thirdparty.mitmproxy.exceptions.ServerException: Error starting proxy server: gaierror(-9, 'Address family for hostname not supported')

mirisr avatar Jun 02 '22 12:06 mirisr

I work in similar set up. I never pass addr because it would take localhost. You need only port

sanjeevtrz avatar Jun 02 '22 13:06 sanjeevtrz

@sanjeevtrz

I work in similar set up. I never pass addr because it would take localhost. You need only port

isn't that the same as this change I made earlier: https://github.com/wkeeling/selenium-wire/issues/550#issuecomment-1143679786

mirisr avatar Jun 02 '22 13:06 mirisr

No 0.0.0.0 works for Mac only for Linux '127.0.0.1' So it better to leave it to be default as this would be 'localhost'

sanjeevtrz avatar Jun 02 '22 13:06 sanjeevtrz

No 0.0.0.0 works for Mac only for Linux '127.0.0.1' So it better to leave it to be default as this would be 'localhost'

Interesting then, that it still runs. I'll try it out.

mirisr avatar Jun 02 '22 13:06 mirisr

I work in similar set up. I never pass addr because it would take localhost. You need only port

@sanjeevtrz That didn't work. It still runs, but not using the authenticated proxy I provided in "proxy" args for selenium-wire options.

When I look at the Firefox network settings, it doesn't even have the radio button selected for "manual proxy configuration". Instead it has selected "Use system proxy settings". So something is still not working.

mirisr avatar Jun 02 '22 15:06 mirisr

I use chrome however, if you want to use proxy with Firefox. Try installing a plugin and entering a proxy. You can do automation of installation of plugins and using proxies with authentication.

https://www.lambdatest.com/blog/adding-firefox-extensions-with-selenium-in-python/

I think you don't need selenium-wire for the above approach.

sanjeevtrz avatar Jun 02 '22 16:06 sanjeevtrz

@mirisr apologies for the delayed reply, but are you still having issues with this? I notice that you're using

firefox_options.add_argument(f"--proxy-server={vm_ip}:{vm_port}")

to set the proxy for Firefox, but I believe --proxy-server is a Chrome option - so Firefox will ignore it.

wkeeling avatar Jun 16 '22 10:06 wkeeling

yes, never got it to work. Then how would I set it up for Firefox?

mirisr avatar Jun 16 '22 12:06 mirisr

Try passing a Proxy object containing the config:

proxy = webdriver.Proxy()
proxy.http_proxy = f'{vm_ip}:{vm_port}'
proxy.ssl_proxy = f'{vm_ip}:{vm_port}'

firefox_options = webdriver.FirefoxOptions()
firefox_options.proxy = proxy

driver = webdriver.Remote(
    command_executor=selenium_connection,  
    desired_capabilities=DesiredCapabilities.FIREFOX,
    options=firefox_options,
    keep_alive=True,
    browser_profile=firefox_profile, 
    seleniumwire_options=selenium_options,
)

wkeeling avatar Jun 16 '22 13:06 wkeeling

Botasaurus Framework supports SSL with authenticated proxy sych as http://username:password@proxy-provider-domain:port.

seleniumwire-vs-botasaurus

Installation

pip install botasaurus

Example

from botasaurus import *

@browser(proxy="http://username:password@proxy-provider-domain:port") # TODO: Replace with your own proxy 
def visit_ipinfo(driver: AntiDetectDriver, data):
    driver.get("https://ipinfo.io/")
    driver.prompt()

visit_ipinfo()

You can learn about Botasaurus Here.

Chetan11-dev avatar Dec 23 '23 12:12 Chetan11-dev