undici icon indicating copy to clipboard operation
undici copied to clipboard

Support HTTP_PROXY and NO_PROXY env var

Open shttrstckak opened this issue 3 years ago • 32 comments

A somewhat standard env var used in proxying applications is the HTTP_PROXY and NO_PROXY variables.

Undici should support automatically setting a global dispatcher if HTTP_PROXY env var is used. Currently Undici supports using setGlobalDispatcher() method to set a specific proxy endpoint.

However there is a sister environment variable to HTTP_PROXY that supports contextually non-proxying domains: NO_PROXY

NO_PROXY requires a list of domain names to be NOT proxied and should make a request normally.

If any of the above cannot be supported, then Undici (and ultimately Node's Fetch-Undici) should allow per-request proxy settings or function hook to allow anyone to develop the same effect as HTTP_PROXY and NO_PROXY env var in their application that uses Undici/Node Fetch.

Thoughts?

shttrstckak avatar Sep 14 '22 20:09 shttrstckak

I don't know if we should support HTTP_PROXY. How is widespread the use of HTTP_PROXY in the ecosystem.

mcollina avatar Sep 14 '22 21:09 mcollina

The "ecosystem" as in the JS ecosystem or the larger software ecosystem?

For the latter, HTTP_PROXY is used everywhere.

For JS it's a quite heavily used I imagine for the same reasons. Popular frameworks tend to implement it.

Node.http module already supports it out of the box.

Got, Axios, Node-fetch (npm package) require a module global-agent to utilize. Global-agent does not appear to have any plans to support Undici.

Popular JS based testing framework Cypress supports it.

On the other side, browsers tend to support some sort of proxy configuration.

Overall I think if there is no interest in supporting HTTP_PROXY in Undici, then it would be better to support to being able to implement these env vars. HTTP_PROXY can already be supported easily however NO_PROXY doesn't seem currently possible.

shttrstckak avatar Sep 14 '22 22:09 shttrstckak

Use of these environment variables is in very widespread, see number of upvotes on https://github.com/nodejs/node/issues/8381. Many high-security environments restrict outbound internet access via a allowlist of domains configured on the proxy.

I do maintain a fetch wrapper specifically for the purpose of automatic proxy discovery from environment. The crucial part is that if such proxy discovery is performed, there must be an opt-out option because sometimes one wants to fetch without proxy, irregardless of environment variables.

Parsing of these env vars is tricky, I recommend https://github.com/Rob--W/proxy-from-env for it.

silverwind avatar Sep 15 '22 15:09 silverwind

Then let's do it.

mcollina avatar Sep 16 '22 08:09 mcollina

I offer myself as sacrifice and will probably get started this weekend.

Does anyone have any good starters on developing with this repository? For example, What would be a good place in code to start investigating for this feature?

shttrstckak avatar Oct 13 '22 16:10 shttrstckak

The check for automatically supporting them should likely be in https://github.com/nodejs/undici/blob/main/lib/global.js.

mcollina avatar Oct 14 '22 08:10 mcollina

We probably should cleanly re-implement https://github.com/Rob--W/proxy-from-env because that module includes support for npm_config_* variables which I think has no place in module that is supposed to only support the standard environment variables. See discussion in https://github.com/Rob--W/proxy-from-env/issues/13.

silverwind avatar Oct 14 '22 09:10 silverwind

Also read https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/. It gives an overview of the different implementations and the last two paragraphs give good advice on how to implement.

I plan to create a module that exactly implements such a parser.

silverwind avatar Oct 17 '22 15:10 silverwind

There is my PoC:

import {URL} from 'node:url';
import {getGlobalDispatcher, setGlobalDispatcher, Dispatcher, ProxyAgent} from 'undici';

const proxyAgents = Object.fromEntries(
  ['http', 'https'].map( protocol => {
    const uri = process.env[ `${protocol}_proxy` ];
    if ( uri ){
      return [`${protocol}:`, new ProxyAgent(uri)];
    }
    return [];
  })
);

const noProxyRules = (process.env['no_proxy'] ?? '').split(',').map( rule => rule.trim() );

const defaultDispatcher = getGlobalDispatcher();

setGlobalDispatcher(new class extends Dispatcher {

  dispatch(options, handler) {
    if (options.origin ){
      const {host, protocol} = typeof options.origin === 'string' ? new URL(options.origin) : options.origin;
      if (! noProxyRules.some( rule => rule.startsWith('.') ? host.endsWith(rule) : host === rule )){
        const proxyAgent = proxyAgents[ protocol ];
        if ( proxyAgent ) {
          proxyAgent.dispatch( options, handler );
        }
      }
    }
    return defaultDispatcher.dispatch(options, handler);
  }

});

I think that this feature should be treated as plugin and loaded by node "preload" argument:

node -r undici/register-http-proxy ./script.js

sla100 avatar Dec 07 '22 18:12 sla100

Seems like a start, but it's definitely missing some features like wildcard (*) support in no_proxy.

I think that this feature should be treated as plugin and loaded by node "preload"

I disagree. Numerous environments support these variables by default, like Deno, Python, Golang and many more. It should of course be considered a breaking change, but I don't see why it should not be the default. Some kind of opt-out would be good to have, but I guess users could always overwrite the env with no_proxy=* I guess.

Maybe the implementation here could be "inspired" by Deno's 😉.

silverwind avatar Dec 07 '22 20:12 silverwind

@sla100 my 2 cents on this line:

 return [`${protocol}:`, new ProxyAgent(uri)];
 ...
 const proxyAgent = proxyAgents[ protocol ];

you can also do http over https, that's why most frameworks specify the https option as optional. Meaning that if you haven't specified https_proxy it would do HTTP over the http-proxy :)

that's why I propose smth like this:

const getProxyAgent = (proto: 'http' | 'https') => {
  let agent: ProxyAgent | undefined;
  if (proto === 'http') {
    agent = process.env['https_proxy'] ? new ProxyAgent(process.env['https_proxy']) : undefined;
  }
  if (!agent) {
    agent = process.env['http_proxy'] ? new ProxyAgent(process.env['http_proxy']) : undefined;
  }
  return agent;
};

const noProxyRules = (process.env['no_proxy'] ?? '').split(',').map(rule => rule.trim());

const defaultDispatcher = getGlobalDispatcher();

setGlobalDispatcher(new class extends Dispatcher {

  dispatch(options, handler) {
    if (options.origin) {
      const {
        host,
        protocol,
      } = typeof options.origin === 'string' ? new URL(options.origin) : options.origin;
      if (!noProxyRules.some(rule => rule.startsWith('.') ? host.endsWith(rule) : host === rule)) {
        const proxyAgent = getProxyAgent(protocol);
        if (proxyAgent) {
          proxyAgent.dispatch(options, handler);
        }
      }
    }
    return defaultDispatcher.dispatch(options, handler);
  }
});

jaecktec avatar Dec 12 '22 08:12 jaecktec

I'd be ok to support:

import { setGlobalDispatcher, EnvHttpProxyAgent } from 'undici'

setGlobalDispatcher(new EnvHttpProxyAgent())

mcollina avatar Dec 13 '22 10:12 mcollina

Here is an additional data point in favor of this: Deno supports it by default.

https://deno.land/[email protected]/getting_started/setup_your_environment#environment-variables

We should check if Python, Ruby and other runtimes supports this by default.

mcollina avatar Jan 18 '23 09:01 mcollina

As per https://github.com/nodejs/node/issues/8381, the following environments support these variables by default:

  • Python
  • Go
  • Ruby
  • R

curl and wget also use them by default and I'm sure there are many more examples of applications supporting them.

silverwind avatar Jan 18 '23 09:01 silverwind

@silverwind would you like to send a PR to make this the default here? I would just recommend we create a new Agent and set it as default.

mcollina avatar Jan 18 '23 11:01 mcollina

Possibly, but first I want to create the parser for the env vars. I have a WIP repo for that, will update here once it's in a usable state. I guess we could then incorporate this parser code here, e.g. without a dependency.

silverwind avatar Jan 18 '23 13:01 silverwind

Just adding another data point, bun also now supports these variables out of the box.

silverwind avatar Jan 19 '23 21:01 silverwind

I'll assign this to you, ok?

mcollina avatar Jan 19 '23 21:01 mcollina

Okay

silverwind avatar Jan 19 '23 21:01 silverwind

Is it okay to cache ProxyAgents for each environment variable in-memory? We shouldn't instantiate new agents every request, and I was thinking of doing a LRU cache that holds the ProxyAgent instances, keyed on the proxy server URL.

This above approach has the downside that the cache will become stale if the environment variables change during process runtime, so a completely robust solution could likely not cache the agents at all as long as the env vars are mutable.

In https://github.com/silverwind/fetch-enhanced, I had exposed a method to clear the cache, but that seems like a it too much API surface.

silverwind avatar Jan 20 '23 14:01 silverwind

Actually, thinking again, if we have a agent cache like

{
  "https://proxy1:3128": ProxyAgent,
  "https://proxy2:3128": ProxyAgent,
}

It should not be neccessary to clear this cache ever because even with changing environment variables, the agents will still be valid. The only issue is that when env vars change, unused ProxyAgent instances will not be cleaned by the garbage collector as there are still references to it, but I take that as acceptable.

silverwind avatar Jan 20 '23 15:01 silverwind

That's acceptable.

mcollina avatar Jan 20 '23 16:01 mcollina

Yeah, in a general use case, there would only be two cache entries, one for HTTP and one for HTTPS. If a user wants more dynamic behaviour, they can just deal with ProxyAgent themselves.

silverwind avatar Jan 20 '23 18:01 silverwind

I'll unassign myself for now, I can't find the motivation to complete this, and I don't want to block anyone else from working in it.

silverwind avatar Apr 17 '23 22:04 silverwind

Have you committed your progress anywhere? I'll take a shot at this eventually and I'm kinda hoping I don't need to start from scratch if I don't need to

KhafraDev avatar Apr 17 '23 23:04 KhafraDev

@KhafraDev invited you to my private repo for the env parsing. It's really not much so you could instead just fresh-start with the env parsing based on proxy-from-env. The tasks I see are:

  1. Start with proxy-from-env and remove support for the npm-related vars, ensure their tests pass.
  2. Double check https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/ is honored, or whether there is room for improvement based on it. Would not support CIDR. Would not support any env var not mentioned in this post.
  3. Work out how to cache ProxyAgent per-origin.
  4. (Optionally) provide an opt-out flag, mostly for debug only I suppose as users could just set no_proxy=* as an effective opt-out.

silverwind avatar Apr 20 '23 17:04 silverwind

Just for reference for anybody else who might be in a similar situation: I ran into this issue when using SvelteKit and was able to temporarily workaround by adding this to src/hooks.server.ts:

import type { Handle } from '@sveltejs/kit';
import { env } from '$env/dynamic/private';
import { dev } from '$app/environment';
import { setGlobalDispatcher, ProxyAgent } from 'undici';

if (dev && env.http_proxy) {
	const proxyUrl = new URL(env.http_proxy);
	const token = `Basic ${btoa(`${proxyUrl.username}:${proxyUrl.password}`)}`;

	const proxyAgent = new ProxyAgent({
		uri: proxyUrl.protocol + proxyUrl.host,
		token
	});

	setGlobalDispatcher(proxyAgent);
}

export const handle = (async ({ event, resolve }) => {
	return resolve(event);
}) satisfies Handle;

Since I use deno for production, which supports the proxy out of the box, I only needed to mess with the agent during development.

zicklag avatar May 09 '23 15:05 zicklag

Hi there, what's the latest on this capability? Is there a branch somewhere with this work in progress?

thanks for all your work!

aarlaud avatar Sep 12 '23 08:09 aarlaud

@aarlaud ProxyAgent can be configured for each request like this The disadvantage is that you need to install and introduce the undici package. This package is large and slow to load, taking 100ms (first time).

const { ProxyAgent } = await import ('undici')

let proxy_agents: Record<string, ProxyAgent> = { }

let proxy = 'http://localhost:10080'

let options: RequestInit = {
    dispatcher: (() => {
        if (proxy)
            return proxy_agents[proxy] ??= new ProxyAgent({ uri: proxy })
    })(),

ShenHongFei avatar Sep 12 '23 09:09 ShenHongFei

thanks. In my current case, I'm more interested in a global, which I suppose is more this way, like @zicklag mention above roughly (https://undici.nodejs.org/#/docs/api/ProxyAgent?id=example-basic-proxy-request-with-global-agent-dispatcher)

I was hoping to avoid dealing with HTTP_PROXY vs HTTPS_PROXY, auth, and NO_PROXY, maybe helping to finish something around supporting these, but i'm not clear whether there is something in flight at this point?

aarlaud avatar Sep 12 '23 16:09 aarlaud