node icon indicating copy to clipboard operation
node copied to clipboard

net: add autoDetectFamily option

Open ShogunPanda opened this issue 2 years ago • 2 comments

This PR implements the Happy Eyeballs algorithm.

A new option autoDetectFamily is added to net.connect. When enabled and when family is still 0, the lookup phase will keep all records by setting all=true. A connection attempt is tried to all A and AAAA records, keeping only the first successful connection, while others are disconnected. Errors are raised only if no connection succeeded.

Fixes #41625.

ShogunPanda avatar Sep 20 '22 12:09 ShogunPanda

Review requested:

  • [ ] @nodejs/net

nodejs-github-bot avatar Sep 20 '22 12:09 nodejs-github-bot

@nodejs/tsc this is a notable change that we should land before v18 goes LTS. We have seen quite a few reports of bugs due to improper configuration of IPv6, making https://github.com/nodejs/node/issues/41625 a necessity. For example, Docker on Mac only exposes ports on the IPv4 address, making all connections to localhost fail.

mcollina avatar Sep 20 '22 12:09 mcollina

No comment (yet at least) one way or the other on the implementation, but +1000 to implementing happy eyeballs in Node.js.

Trott avatar Sep 21 '22 15:09 Trott

The two new tests fail for me (Linux x64) and also in GitHub actions.

=== release test-http-happy-eyeballs ===
Path: parallel/test-http-happy-eyeballs
Error: --- stderr ---
node:events:491
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE: address already in use :::33167
    at Server.setupListenHandle [as _listen2] (node:net:1649:16)
    at listenInCluster (node:net:1697:12)
    at doListen (node:net:1846:7)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21)
Emitted 'error' event on Server instance at:
    at emitErrorNT (node:net:1676:8)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '::',
  port: 33167
}

Node.js v19.0.0-pre
Command: out/Release/node /home/runner/work/node/node/test/parallel/test-http-happy-eyeballs.js
=== release test-net-happy-eyeballs ===
Path: parallel/test-net-happy-eyeballs
Error: --- stderr ---
node:events:491
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE: address already in use :::37301
    at Server.setupListenHandle [as _listen2] (node:net:1649:16)
    at listenInCluster (node:net:1697:12)
    at doListen (node:net:1846:7)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21)
Emitted 'error' event on Server instance at:
    at emitErrorNT (node:net:1676:8)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '::',
  port: 37301
}

Node.js v19.0.0-pre
Command: out/Release/node /home/runner/work/node/node/test/parallel/test-net-happy-eyeballs.js

===
=== 2 tests failed
===

richardlau avatar Sep 21 '22 16:09 richardlau

Acknowledged! I'm taking care of them locally.

ShogunPanda avatar Sep 21 '22 16:09 ShogunPanda

No comment (yet at least) one way or the other on the implementation, but +1000 to implementing happy eyeballs in Node.js.

Same goes for me as well.

mhdawson avatar Sep 21 '22 17:09 mhdawson

we should land before v18 goes LTS

@nodejs/tsc ~Two comments.~

  1. ~If I understand correctly, this is a breaking change. It should be semver major.~ EDIT: Oh, wait, this is an option that is off by default? So people would need to opt in to get the fix? Oh, then I can't imagine this is controversial at all. Yes, let's get this working and landed! Woo hoo! 🎉 🚀
  2. ~However...I am in favor of landing this in the 18.x line as a special case of an important bug fix. Specifically:~

~Docker on Mac only exposes ports on the IPv4 address, making all connections to localhost fail.~

Trott avatar Sep 26 '22 20:09 Trott

I'm going to rebase and force push to at least get the tarball job working.

Trott avatar Sep 26 '22 20:09 Trott

Oh, wait, this is an option that is off by default? So people would need to opt in to get the fix?

My case is that we should make it enabled by default. This causes so many headaches to newbie developers that know little of IPv4 vs IPv6 and why localhost is two IPs.

We should make this change before v18 goes LTS.

mcollina avatar Sep 27 '22 06:09 mcollina

@mcollina how about we make it the default in a subsequent PR where only that bit can be discussed?

targos avatar Sep 27 '22 06:09 targos

@targos @mcollina

I put the default for each new socket in Socket.autoDetectFamily (note: a class property that is then reflected to instance options unless the user specifically provides that) and set it to true. This was both to make it easy to fix tests and also to make easy for user to make their choice no matter the default we will agree on.

@aduh95 I will add the test you asked for, but I'm not gonna be able to until this Wednesday. So we can choose to land this and I'll send a specific PR later.

ShogunPanda avatar Sep 27 '22 06:09 ShogunPanda

@mcollina how about we make it the default in a subsequent PR where only that bit can be discussed?

Agreed!

mcollina avatar Sep 27 '22 08:09 mcollina

@mcollina how about we make it the default in a subsequent PR where only that bit can be discussed?

Agreed!

Ok, understood. I'll send an update to this PR shortly.

ShogunPanda avatar Sep 27 '22 09:09 ShogunPanda

Can we add test for dnsPromises.Resolver as well?

I was willing to, but then I realized I'm not sure why we need a test for this. Could you please clarify?

ShogunPanda avatar Sep 27 '22 10:09 ShogunPanda

CI: https://ci.nodejs.org/job/node-test-pull-request/46859/

nodejs-github-bot avatar Sep 27 '22 10:09 nodejs-github-bot

@nodejs/build I see that the new tests introduce failed due to address already in use under IPv6. I do use {port: 0} when listening, so I don't see how that it happens.

Am I missing something about the CI environment?

ShogunPanda avatar Sep 28 '22 11:09 ShogunPanda

@nodejs/build I see that the new tests introduce failed due to address already in use under IPv6. I do use {port: 0} when listening, so I don't see how that it happens.

Am I missing something about the CI environment?

The same tests are failing in Actions (https://github.com/nodejs/node/actions/runs/3134711995/jobs/5089627090) so this is not specific to the Jenkins CI environment.

General advice: tests in parallel will be competing for ports with other tests in parallel. Also maybe you'd need to double check the port is actually 0 (i.e. the code isn't somehow setting it to a non-zero value) when the bind is attempted.

richardlau avatar Sep 28 '22 16:09 richardlau

Ok, I see, makes sense! Tomorrow I will update the test to retry the operation in case of conflicts.

ShogunPanda avatar Sep 28 '22 18:09 ShogunPanda

I rebased and force-pushed to try to fix the CI issues.

Trott avatar Sep 28 '22 21:09 Trott

CI: https://ci.nodejs.org/job/node-test-pull-request/46909/

nodejs-github-bot avatar Sep 28 '22 21:09 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/46936/

nodejs-github-bot avatar Sep 29 '22 13:09 nodejs-github-bot

@tniessen I didn't know a spec exists, TBH. I'll read it and implement properly. Especially the delay thing, which will make the test changes unneeded. Thanks for pointing me there. I'll update this in the next few days.

ShogunPanda avatar Sep 29 '22 14:09 ShogunPanda

In terms of the question to the TSC on wether we'd be ok with enbling by default (assuming we get it right and have enough baking time) my main questions is if there are cases where people would be expecting to get a specific type of connection (IPV4 or IPV6) and enabling this by default would break that.

One that I can think of is that you have set --dns-result-order=ipv4first or --dns-result-order=ipv6first. If if you asked for ipv4 or ipv6 specifically you should get that if it is possible. If if was happy eyeballs was enabled by default, but was disabled if you set --dns-result-order specifically that likely addresses that specific case.

mhdawson avatar Sep 29 '22 15:09 mhdawson

Since people did not specify a family, we can assume they have no preference (and it is optional in the RFC) and therefore we can skip those options for now and eventually add them later (as it would be semver-minor).

In order to make it RFC compliant I just have to try the connection in series (after 250ms as advised) rather than in parallel.

@tniessen Do you agree?

ShogunPanda avatar Sep 29 '22 16:09 ShogunPanda

Since people did not specify a family, we can assume they have no preference (and it is optional in the RFC) and therefore we can skip those options for now and eventually add them later (as it would be semver-minor).

I’m not sure I follow. The --dns-result-order flag already exists, so we should make sure that this PR doesn’t break it (as in, if the user specifies one family or the other via that flag, I would expect that request to be honored and would supersede any attempt at doing happy eyeballs).

GeoffreyBooth avatar Sep 29 '22 22:09 GeoffreyBooth

Since people did not specify a family, we can assume they have no preference (and it is optional in the RFC) and therefore we can skip those options for now and eventually add them later (as it would be semver-minor).

I’m not sure I follow. The --dns-result-order flag already exists, so we should make sure that this PR doesn’t break it (as in, if the user specifies one family or the other via that flag, I would expect that request to be honored and would supersede any attempt at doing happy eyeballs).

Yes, but what I meant is that we can the RFC does not mandate to implement this so we can eventually postpone if we run out of time.

ShogunPanda avatar Sep 30 '22 10:09 ShogunPanda

In order to make it RFC compliant I just have to try the connection in series (after 250ms as advised) rather than in parallel.

I am not quite sure if that completely eliminates the possibility of opening more than one connection.

Side note: we don't have to adhere to the RFCs at all if we don't call it Happy Eyeballs. But it seems that, where this implementation deviates from the RFCs, the spec certainly makes sense.

tniessen avatar Sep 30 '22 11:09 tniessen

Makes sense. I'll implement the connection delay and then review how much we deviate from the spec.

We can eventually rename the feature and try to stay the close to the RFC as possible if needed.

Will update this soon, considering also Node Collab Summit and NodeConf.

ShogunPanda avatar Sep 30 '22 11:09 ShogunPanda

CI: https://ci.nodejs.org/job/node-test-pull-request/47177/

nodejs-github-bot avatar Oct 11 '22 10:10 nodejs-github-bot

Tests are still failing in GH Actions and in Jenkins. e.g. https://github.com/nodejs/node/actions/runs/3222806238/jobs/5272255739

richardlau avatar Oct 11 '22 12:10 richardlau