oauthenticator icon indicating copy to clipboard operation
oauthenticator copied to clipboard

Configuring web-requests to use a proxy

Open dtandersen opened this issue 5 years ago • 7 comments

I think there is something preventing CurlAsyncHTTPClient from accepting defaults.

AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", defaults=defaults) didn't work from config.yaml.

I had to do this hack:

  extraConfig: |
    import pycurl
    from tornado.httpclient import HTTPRequest

    def configure_proxy(curl):
        logging.error(curl.getinfo(pycurl.EFFECTIVE_URL))
        # we only want google oauth to use the proxy
        if "google" in curl.getinfo(pycurl.EFFECTIVE_URL):
            logging.error("adding proxy")
            curl.setopt(pycurl.PROXY, "proxy.example.com")
            curl.setopt(pycurl.PROXYPORT, 8080)

    # never do this
    HTTPRequest._DEFAULTS['prepare_curl_callback'] = configure_proxy

I don't know why this doesn't work:

    import certifi
    from tornado.httpclient import AsyncHTTPClient

    defaults2 = dict(ca_certs=certifi.where())
    defaults2['proxy_host'] = 'proxy.example.com'
    defaults2['proxy_port'] = 8080
    AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", defaults=defaults2)

Actually, I do have one guess. Maybe this overwrite the defaults?

https://github.com/jupyterhub/jupyterhub/blob/8437f47f361aab42d11801703145ababa7372538/jupyterhub/app.py#L1622

dtandersen avatar Sep 26 '18 19:09 dtandersen

I'm having the same issue with Azure AD. Your hack worked for me as well. It would be nice to have this fixed though.

os.environ['PYCURL_SSL_LIBRARY'] = 'nss'
subprocess.call([sys.executable, '-m', 'pip', 'install', '--compile', '--proxy', 'http://www.xxx.yyy.zzz:3128', 'pycurl'])
import pycurl
#defaults = {'proxy_host':'www.xxx.yyy.zzz', 'proxy_port':3128, 'request_timeout':300, 'connect_timeout':60}
#AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")

def configure_proxy(curl):
        logging.error(curl.getinfo(pycurl.EFFECTIVE_URL))
        # we only want google oauth to use the proxy
        if "microsoftonline" in curl.getinfo(pycurl.EFFECTIVE_URL):
            logging.error("adding proxy")
            curl.setopt(pycurl.PROXY, "www.xxx.yyy.zzz")
            curl.setopt(pycurl.PROXYPORT, 3128)

# never do this
HTTPRequest._DEFAULTS['prepare_curl_callback'] = configure_proxy

zneudl avatar Jan 23 '19 17:01 zneudl

Hmmm this is quite advanced and I'm not following things so well. There is an issue I think may be related, perhaps you could have a look at that issue @dtandersen @zneudl ?

This issue regards the use of the google oauthenticator from a specific JupyterHub deployment (Zero-to-jupyterhub-k8s, the helm chart): https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/1185

consideRatio avatar Mar 17 '19 22:03 consideRatio

Regarding this issue, I'd love to learn more about the background that I doesn't grasp.

Another input is that I think the extraConfig will load and execute after the initial jupyterhub_config.py has executed, as provided part of the z2jh jupyterhub dockerimage.

Related: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/69ec17c75c950ab11cd09a1315a7f2e93140811f/images/hub/jupyterhub_config.py#L11-L15

Related: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/69ec17c75c950ab11cd09a1315a7f2e93140811f/images/hub/jupyterhub_config.py#L467-L469

consideRatio avatar Mar 17 '19 22:03 consideRatio

Hi. Would be so cool if everybody simply respected the "http_proxy" environment variables...

Here is my hack to make the hub happy with our proxy (our Gitlab instance is behind our proxy). It also support no_proxy with "*", so that we can have finer proxy tuning:

hub:
  extraEnv:
    GITLAB_HOST: "http://external.gitlab.server"
    http_proxy: http://internalproxy.ourcompany.fr:123
    HTTP_PROXY: http://internalproxy.ourcompany.fr:123
    https_proxy: http://internalproxy.ourcompany.fr:123
    HTTPS_PROXY: http://internalproxy.ourcompany.fr:123
    no_proxy: localhost,127.0.0.1,*.ourcompany.fr,10.*,localdomain,cluster.local
    NO_PROXY: localhost,127.0.0.1,*.ourcompany.fr,10.*,localdomain,cluster.local
 
  extraConfig: |
    # HACK: consume HTTP?_PROXY and NO_PROXY environment variables
    # so Hub can connect to external Gitlab.
    # https://github.com/jupyterhub/oauthenticator/issues/217
    import pycurl
    import os
    import logging
    from tornado.httpclient import HTTPRequest
    from urllib.parse import urlparse
    from fnmatch import fnmatch

    def get_proxies_for_url(url):
        http_proxy = os.environ.get("HTTP_PROXY", os.environ.get("http_proxy"))
        https_proxy = os.environ.get("HTTPS_PROXY", os.environ.get("https_proxy"))
        no_proxy = os.environ.get("NO_PROXY", os.environ.get("no_proxy"))
        p = urlparse(url)
        netloc = p.netloc
        _userpass,_, hostport = p.netloc.rpartition("@")
        url_hostname, _,  _port = hostport.partition(":")
        proxies = {}
        if http_proxy:
            proxies["http"] = http_proxy
        if https_proxy:
            proxies["https"] = https_proxy
        if no_proxy:
            for hostname in no_proxy.split(","):
                # Support "*.server.com" and "10.*"
                if fnmatch(url_hostname, hostname.strip()):
                    proxies = {}
                    break
                # Support ".server.com"
                elif hostname.strip().replace("*", "").endswith(url_hostname):
                    proxies = {}
                    break
                # TODO: support network mask: 10.0.0.0/8
        return proxies

    def configure_proxy(curl):
        logging.error("URL: {0}".format(curl.getinfo(pycurl.EFFECTIVE_URL)))
        # we only want google oauth to use the proxy
        proxies = get_proxies_for_url(curl.getinfo(pycurl.EFFECTIVE_URL))
        if proxies:
            host, _, port = proxies["https"].rpartition(":")
            logging.error("adding proxy: https={0}:{1}".format(host, port))
            curl.setopt(pycurl.PROXY, host)
            if port:
                curl.setopt(pycurl.PROXYPORT, int(port))

    # never do this
    HTTPRequest._DEFAULTS['prepare_curl_callback'] = configure_proxy

gsemet avatar May 15 '19 09:05 gsemet

this feels much like hacking...

seems that the issue is with tornado that doesn't respect http_proxy environment variable for CurlAsyncHttpClient: https://github.com/tornadoweb/tornado/issues/754

The conclusion for the tornado issue is:

OK. In that case you must tell tornado that you want to use a proxy by setting the proxy_host and proxy_port arguments.

Even for simple_httpclient i don't know if it supports no_proxy

:(

looks like hacking into pycurl is the only solution for now...

cblomart avatar Oct 16 '19 18:10 cblomart

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/unable-to-get-git-hub-oauth-work-on-a-jupyterhub-server-which-is-behind-a-proxy/6334/5

meeseeksmachine avatar Oct 13 '20 18:10 meeseeksmachine

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/unable-to-get-git-hub-oauth-work-on-a-jupyterhub-server-which-is-behind-a-proxy/6334/6

meeseeksmachine avatar Oct 14 '20 06:10 meeseeksmachine

I think the best option may be to explicitly set proxy_host for any Tornado requests made by OAuthenticator. https://www.tornadoweb.org/en/stable/httpclient.html#tornado.httpclient.HTTPRequest

Setting this globally may lead to problems, for example with Z2JH you'd only want to use the proxy for external requests and not for connections to other K8s servers.

manics avatar Mar 12 '23 14:03 manics