fix: handle undefined deref() of WeakRef(socket)
This relates to...
This is an issue introduced in v6.20.0 where an undefined socket value is passed to onConnectTimeout causing a TypeError: Cannot read properties of undefined (reading 'autoSelectFamilyAttemptedAddresses').
Rationale
This Pull request adds optional chaining to the socket input variable that can be undefined.
Changes
This Pull request adds optional chaining to the socket input variable that can be undefined.
Features
N/A
Bug Fixes
This Pull request adds optional chaining to the socket input variable that can be undefined.
Breaking Changes and Deprecations
N/A
Status
- [X] I have read and agreed to the Developer's Certificate of Origin
- [X] Tested
- [S] Benchmarked (optional)
- [S] Documented
- [ ] Review ready
- [ ] In review
- [ ] Merge ready
I was looking for the unit-test file for ./lib/core/connect.js, which exports buildConnector, but can't seem to find the test file. Can you point me there?
As for integration and e2e tests, it is a little difficult to simulate this edge case. This is the environment in which this can happen:
- The server is deployed inside an organization's internal network or VPN (GCP, AWS, on-prem)
- The internal network has access to internal endpoints
- The internal network has access to public endpoints, but only via the HTTP_PROXY
The trigger for this happening is when there are 2 concurrent requests
- Request 1 - is trying to reach the internet using
ProxyAgentbut the proxy is not reachable and thus returns aERR_TUNNEL_CONNECTION_FAILED - Request 2 - is trying to reach an unreachable internal network endpoint, but cannot be reached within timeout, and thus returns a
ERR_CONNECTION_TIMED_OUT
How would this scenario happen in a real setting? What business value is there?
- I am running a utility javascript script to scan for the availability of a set of predetermined endpoints from within specific clusters
- Depending on the cluster's networking setup, some internal/external endpoints may be available. Others might not. This is valuable information for security teams to verify cluster/networking administration, but mostly for developers to provide feedback to cluster/network admins.
I have a 100% reproducible e2e scenario that can simulate it, but the endpoints and VPN belong to the company I work for, so not something I can share. I am open however to running any commands/diagnotics to extract any generic information.
This pseudocode is as close as I can get to simulating this:
// ./test/proxy-agent.js
test('ProxyAgent throws ERR_TUNNEL_CONNECTION_FAILED on non-existent proxy', async (t) => {
t = tspl(t, { plan: 2 })
// Request 1 setup
const unreachableProxyUrl = `http://localhost:9999` // Non-existent/reachable proxy server
const proxyAgent = new ProxyAgent(unreachableProxyUrl)
const fetchOptions1 = { dispatcher: proxyAgent }
// Request 2 setup
const timeoutServer = await buildTimeoutServer()
const timeoutServerUrl = timeoutServer.url
const fetchOptions2 = {}
try {
const response1 = fetch(internetUrl, fetchOptions1) // ERR_TUNNEL_CONNECTION_FAILED
const response2 = fetch(timeoutServerUrl, fetchOptions2) // ERR_CONNECTION_TIMED_OUT
await Promise.allSettled([
response1,
response2
])
t.fail('Request should fail')
} catch (e) {
t.ok(true, 'Request failed as expected')
t.ok(e.message.includes('ERR_TUNNEL_CONNECTION_FAILED'), 'Error message should indicate tunnel connection failure')
}
unreachableServer.close()
proxyAgent.close()
})
I have tested this optional chaining locally on 6.20.0, 6.20.1 and it fixes the TypeError when socket is undefined. 6.19.8 does not have this issue, so I looked through the commits and narrowed it down to either:
- https://github.com/nodejs/undici/pull/3673/files
- https://github.com/nodejs/undici/pull/3675/files
But am unable to figure why the socket is undefined due to the change in the timers. Potentially, some kind of race condition? Either way, the fact that socket is a WeakRef means that this piece of code should always expect and gracefully handle an undefined.
I was looking for the unit-test file for
./lib/core/connect.js, which exportsbuildConnector, but can't seem to find the test file. Can you point me there?
Adding it into the test/client.js would be enough
The testing scenario you shared seems enough for an e2e testing. Feel free to use an HTTP server setup within the same testing scenario that mimics the possible network and server behaviour your are facing.
Hello. i have a same issue. I hope the team merges quickly, it's already affecting my online products
We are seeing this in prod as well semi frequently (happened 3 times in past 24 hours)