undici icon indicating copy to clipboard operation
undici copied to clipboard

fix: handle undefined deref() of WeakRef(socket)

Open hochoy opened this issue 1 year ago • 4 comments

This relates to...

This is an issue introduced in v6.20.0 where an undefined socket value is passed to onConnectTimeout causing a TypeError: Cannot read properties of undefined (reading 'autoSelectFamilyAttemptedAddresses').

Rationale

This Pull request adds optional chaining to the socket input variable that can be undefined.

Changes

This Pull request adds optional chaining to the socket input variable that can be undefined.

Features

N/A

Bug Fixes

This Pull request adds optional chaining to the socket input variable that can be undefined.

Breaking Changes and Deprecations

N/A

Status

  • [X] I have read and agreed to the Developer's Certificate of Origin
  • [X] Tested
  • [S] Benchmarked (optional)
  • [S] Documented
  • [ ] Review ready
  • [ ] In review
  • [ ] Merge ready

hochoy avatar Oct 20 '24 06:10 hochoy

I was looking for the unit-test file for ./lib/core/connect.js, which exports buildConnector, but can't seem to find the test file. Can you point me there?

hochoy avatar Oct 20 '24 06:10 hochoy

As for integration and e2e tests, it is a little difficult to simulate this edge case. This is the environment in which this can happen:

  1. The server is deployed inside an organization's internal network or VPN (GCP, AWS, on-prem)
  2. The internal network has access to internal endpoints
  3. The internal network has access to public endpoints, but only via the HTTP_PROXY

The trigger for this happening is when there are 2 concurrent requests

  1. Request 1 - is trying to reach the internet using ProxyAgent but the proxy is not reachable and thus returns a ERR_TUNNEL_CONNECTION_FAILED
  2. Request 2 - is trying to reach an unreachable internal network endpoint, but cannot be reached within timeout, and thus returns a ERR_CONNECTION_TIMED_OUT

How would this scenario happen in a real setting? What business value is there?

  • I am running a utility javascript script to scan for the availability of a set of predetermined endpoints from within specific clusters
  • Depending on the cluster's networking setup, some internal/external endpoints may be available. Others might not. This is valuable information for security teams to verify cluster/networking administration, but mostly for developers to provide feedback to cluster/network admins.

I have a 100% reproducible e2e scenario that can simulate it, but the endpoints and VPN belong to the company I work for, so not something I can share. I am open however to running any commands/diagnotics to extract any generic information.

This pseudocode is as close as I can get to simulating this:

// ./test/proxy-agent.js
test('ProxyAgent throws ERR_TUNNEL_CONNECTION_FAILED on non-existent proxy', async (t) => {
  t = tspl(t, { plan: 2 })

  // Request 1 setup
  const unreachableProxyUrl = `http://localhost:9999` // Non-existent/reachable proxy server
  const proxyAgent = new ProxyAgent(unreachableProxyUrl)
  const fetchOptions1 = { dispatcher: proxyAgent }
  
  // Request 2 setup
  const timeoutServer = await buildTimeoutServer()
  const timeoutServerUrl = timeoutServer.url
  const fetchOptions2 = {}

  try {
    const response1 = fetch(internetUrl, fetchOptions1) // ERR_TUNNEL_CONNECTION_FAILED
    const response2 = fetch(timeoutServerUrl, fetchOptions2) // ERR_CONNECTION_TIMED_OUT
    
    await Promise.allSettled([
       response1,
       response2
    ])
    
    t.fail('Request should fail')
  } catch (e) {
    t.ok(true, 'Request failed as expected')
    t.ok(e.message.includes('ERR_TUNNEL_CONNECTION_FAILED'), 'Error message should indicate tunnel connection failure')
  }

  unreachableServer.close()
  proxyAgent.close()
})

hochoy avatar Oct 20 '24 06:10 hochoy

I have tested this optional chaining locally on 6.20.0, 6.20.1 and it fixes the TypeError when socket is undefined. 6.19.8 does not have this issue, so I looked through the commits and narrowed it down to either:

  1. https://github.com/nodejs/undici/pull/3673/files
  2. https://github.com/nodejs/undici/pull/3675/files

But am unable to figure why the socket is undefined due to the change in the timers. Potentially, some kind of race condition? Either way, the fact that socket is a WeakRef means that this piece of code should always expect and gracefully handle an undefined.

hochoy avatar Oct 20 '24 07:10 hochoy

I was looking for the unit-test file for ./lib/core/connect.js, which exports buildConnector, but can't seem to find the test file. Can you point me there?

Adding it into the test/client.js would be enough

The testing scenario you shared seems enough for an e2e testing. Feel free to use an HTTP server setup within the same testing scenario that mimics the possible network and server behaviour your are facing.

metcoder95 avatar Oct 20 '24 09:10 metcoder95

Hello. i have a same issue. I hope the team merges quickly, it's already affecting my online products

murrayee avatar Nov 03 '24 17:11 murrayee

We are seeing this in prod as well semi frequently (happened 3 times in past 24 hours)

admangan avatar Nov 11 '24 16:11 admangan