cli icon indicating copy to clipboard operation
cli copied to clipboard

[BUG] Npm opens many connections when installing

Open Larsjep opened this issue 1 year ago • 10 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

This issue exists in the latest npm version

  • [X] I am using the latest npm

Current Behavior

My project needs around 2000 packages. When running "npm install" it starts out by opening 2000 connections and then when all the connections are open it downloads 1 package on each connection.

Is this intended?, because it causes a quite high load on our package proxy. Is there a way to limit the number of connections?

With npm 9.9 it only uses 18/20 connections.

Expected Behavior

"npm install" only opens a few connetions to the package server.

Steps To Reproduce

  1. Clear cache
  2. npm install or npm ci

Environment

  • npm: 10.2.5
  • Node.js: 18.19.0
  • OS Name: Windows 11
  • System Model Name:
  • npm config:
registry = "http://fosspackages/npm/default-npm/"
save-exact = true
save-prefix = ""

; node bin location = C:\Users\lj\.nvm\versions\node\v18.19.0\bin\node.exe
; node version = v18.19.0
; npm local prefix = C:\work\Dexter\web-client
; npm version = 10.2.5
; cwd = C:\work\Dexter\web-client
; HOME = C:\Users\lj
; Run `npm config ls -l` to show all defaults.

Larsjep avatar Dec 11 '23 09:12 Larsjep

I noticed that there is a config option called maxsockets. But the default is 15 and npm config get maxsockets also returns 15. So if this should limit the number of connections it doesn't seem to work in npm 10.x.

Larsjep avatar Dec 12 '23 13:12 Larsjep

@Larsjep Just curious what npm config get agent shows. Supposedly if that's set, maxsockets will be ignored in npm-registry-fetch.

Could also be that's something that's buried in NPM and is somehow overridden in some other way, but it caught my eye when I searched the org for maxsockets.... might also help narrow down the problem at least.

MartinFalatic avatar Dec 20 '23 02:12 MartinFalatic

@MartinFalatic npm config get agent returns undefined

I have also looked in the code base, but I can't find anywhere it refers to the maxsockets settings. I can find it in the 9.x codebase, but the download part seem to radically changed in 10.x

Larsjep avatar Dec 20 '23 11:12 Larsjep

+1 here.

We are using self-hosted runners(ARC) on our own cluster and when using npm 10 we noticed a lot of dropped packets due to NAT ports exhaustion all coming from the arc runner executing npm install with npm 10. We have dynamic port allocation, which means that npm creates a lot of connections in a burst at the same time before the new ports can be allocated to the worker node running the runner pod.

Npm 9 does not seem to have this issue. Looks an issue with the new @npmcli/agent http client introduced in npm 10.

What is also interesting is that the workflow does not fail. npm seems to use whatever number of connections it managed to establish to install all packages.

rosen-dimitrov avatar Dec 20 '23 13:12 rosen-dimitrov

Hello there,

I would like to add that this is actually a bigger issue for very large projects (3000+ dependencies):

  • CPU and RAM usage is a lot higher than with npm 9
  • It is a lot slower than npm 9
  • npm install randomly fails with ECONNRESET and ETIMEDOUT errors (even more consistently on Windows), and seems to ignore most of .npmrc options (maxsockets, fetch-retries, etc)

mirayashi avatar Dec 20 '23 13:12 mirayashi

We're affected by the same issue... Our Jenkins CI is constantly failing to install private dependencies from GitHub (through our Proxy).

Two potential workarounds are disabling the proxy (can't do that in corporate environment) or downgrading to npm 9

Th3S4mur41 avatar Jan 08 '24 11:01 Th3S4mur41

It seems the maxsockets option (default value 15) has no effect at all in npm@10, at least in situations where there is no proxy involved.

When I run a npm install with around 1k dependencies inside a local VMWare VM, I'm consistently hitting a bug in VMWare where npm exhausts the CPU resources of vmnet-natd, which leads to npm hanging for minutes and finally terminating with random network errors. Downgrading to npm@9 completely resolves it.

silverwind avatar Feb 05 '24 15:02 silverwind

+1 here My company is having to hold back NPM 10 because of this issue when connecting to a private self-hosted registry. To help with our decision making, is there any indication when this might be triaged?

ajones513 avatar Feb 06 '24 12:02 ajones513

Same issue encountered on our vm lima. After upgrading to node v20 and npm v10, npm install freezes the entire vm. Downgrading to v9 fixed the issue.

greg-md avatar Feb 21 '24 11:02 greg-md

We are seeing about half of our npm ci runs with npm version 10 on github actions fail; downgrading to npm version 9 speeds up install by about 45 seconds (2m45 -> 2m) and doesn't seem to fail at all.

I can reproduce easily so please let me know if I can help debug this somehow

llimllib avatar Feb 23 '24 14:02 llimllib

Potentially being looked at under #7272.

dharmeshgordhan avatar Mar 08 '24 02:03 dharmeshgordhan

My gitlab ci/cd is also very unstable due to this behavior. Often causing time outs due to the large amount of connections. Especially since I have some jobs running in parallel as well.

melroy89 avatar Mar 10 '24 13:03 melroy89

So... I went into the rabbit hole... It went very very deep... But I finally found the root cause!

  • npm-reqistry-fetch uses make-fetch-happen, see here.
  • make-fetch-happen uses minipass-fetch, see here.
  • minipass-fetch uses Node's builtin http/https, see here.
  • The maxSockets parameter gets passed through all these layers just fine, no issue here.
  • However, the implementation of http.Agent in Node.js is suboptimal: It checks for maxSockets here, but does not immediately increase the checked value. Instead it calls this.createSocket and assumes the function pushes to this.sockets synchronously. Typical risk for a race condition.
  • Now back up the stack again: make-fetch-happen uses @npmcli/agent, see here
  • @npmcli/agent uses agent-base, see here
  • agent-base subclasses http.Agent and overwrites createSocket with an implementation that is asynchronous, see here.
  • 🫠

neelance avatar Mar 16 '24 12:03 neelance

* However, the implementation of `http.Agent` in Node.js is suboptimal: It checks for `maxSockets` [here](https://github.com/nodejs/node/blob/fec7e505fc6f256522eb0031515757f2e58daa61/lib/_http_agent.js#L284), but does not immediately increase the checked value. Instead it calls `this.createSocket` and assumes the function pushes to `this.sockets` **synchronously**. Typical risk for a race condition.

I didn't look very long at the code.

But it seems that this.freeSockets[name].length + freeSockets.length is used to determine the current length of sockets validated again the this.maxSockets option.

This array is increased via: ArrayPrototypePush(this.sockets[name], s); a bit down in the code.

Then there is also something like "max socket count" as well. Which has an easier implementation, the code uses: this.totalSocketCount to keep track of that value, which is validated against the this.maxTotalSocket option.

Anyway there is a lot of complex state keeping going on in the _http_agent.js file. Keep in mind that http_agent.js file is using a local createSocket method, defined below.

melroy89 avatar Mar 16 '24 14:03 melroy89

I recall similar issue https://github.com/npm/make-fetch-happen/issues/59, but in that case it was only for proxied connections, while the current issue affects definitely affects non-proxied connections.

silverwind avatar Mar 16 '24 15:03 silverwind

Here is a workaround I just applied to our CI system. I am using the request as a placeholder in the this.sockets array.

agent-base.patch

--- a/dist/index.js
+++ b/dist/index.js
@@ -65,6 +65,8 @@
             l.indexOf('node:https:') !== -1);
     }
     createSocket(req, options, cb) {
+        const sockets = this.sockets[this.getName(options)];
+        sockets.push(req);
         const connectOpts = {
             ...options,
             secureEndpoint: this.isSecureEndpoint(options),
\ No newline at end of file
@@ -77,6 +79,7 @@
                 return socket.addRequest(req, connectOpts);
             }
             this[INTERNAL].currentSocket = socket;
+            sockets.splice(sockets.indexOf(req), 1);
             // @ts-expect-error `createSocket()` isn't defined in `@types/node`
             super.createSocket(req, options, cb);
         }, cb);
\ No newline at end of file

We're using Earthly, so I added this to our Earthfile:

  # patch for https://github.com/npm/cli/issues/7072
  COPY agent-base.patch ./
  RUN patch -p 1 -d /usr/local/lib/node_modules/npm/node_modules/agent-base/ < agent-base.patch

neelance avatar Mar 17 '24 09:03 neelance

What is the status of this issue? In practice npm, especially in CI systems, is currently creating dos attacks to repository services. In universal repository services such as Artifactory, this affects to all repositories because the service is busy serving floods from npm clients.

I see issue #7272 has been closed as duplicate. Not sure how labels work in this project, but I see #7272 has "Priority 1" label, and if that makes any difference and adds priority, could you please add that label in this issue?

ipikiiskinen avatar Mar 21 '24 06:03 ipikiiskinen

@lukekarrys. Cuz #7272 is closed now (as a duplicate of the current issue), could you please take a look at this?

m0ps avatar Mar 21 '24 07:03 m0ps

I believe this a big issue in NPM for both individuals, small and large companies alike. Please, consider increasing the priority of this ticket (eg. label as as prio 1: https://github.com/npm/cli/labels/Priority%201 or higher).

melroy89 avatar Mar 24 '24 15:03 melroy89

Luke has created an issue in the upstream package https://github.com/TooTallNate/proxy-agents/issues/299. If this can't be fixed quickly we may also look into mitigating it somehow from the npm codebase.

wraithgar avatar Mar 27 '24 03:03 wraithgar

Luke has created an issue in the upstream package TooTallNate/proxy-agents#299. If this can't be fixed quickly we may also look into mitigating it somehow from the npm codebase.

You could try https://github.com/delvedor/hpagent as a replacement, the only feature that one lacks is Windows PAC support, so I guess on Windows you still want to use proxy-agents.

silverwind avatar Mar 29 '24 20:03 silverwind

I think I've landed on a patch for agent-base that will fix this behavior. If no other issues are found I'm planning to release it as part of the next npm release on 2024-04-03.

lukekarrys avatar Mar 29 '24 23:03 lukekarrys

I think I've landed on a patch for agent-base that will fix this behavior. If no other issues are found I'm planning to release it as part of the next npm release on 2024-04-03.

I see: https://github.com/TooTallNate/proxy-agents/pull/302/files#diff-bd534a5bdb8155fe954a536361c5ae2e59247b5ca942066ea365d4aab955526e

Thanks!

melroy89 avatar Mar 30 '24 01:03 melroy89

And thank you to @neelance whose initial investigation saved a me lot of debugging time.

lukekarrys avatar Mar 30 '24 01:03 lukekarrys

I can see you have a PR up to bring in this change to this repo @lukekarrys 🙌 Thank you!

Once merged into NPM, do you have any idea how long a change like this normally takes to be released alongside Node as a Node/npm pair in the Node docker images?

https://nodejs.org/dist/index.json

dominicfraser avatar Apr 01 '24 08:04 dominicfraser

Once merged into NPM, do you have any idea how long a change like this normally takes to be released alongside Node as a Node/npm pair in the Node docker images?

It usually takes 2-4 weeks. In the most recent case, [email protected] was released on 2024-02-28 and first included in Node v20.12.0 which was released on 2024-03-26.

If you need an npm version sooner than that, I recommend having a step in your Docker images to pin/update to a newer npm version using npm install npm@<VERSION|RANGE> -g.

lukekarrys avatar Apr 01 '24 18:04 lukekarrys

Thank you for the help ❤️

dominicfraser avatar Apr 01 '24 19:04 dominicfraser

Indeed. Thank you so much! ❤️ Appreciate the prio shift, I know you are very busy.

melroy89 avatar Apr 02 '24 01:04 melroy89

Released in https://github.com/npm/cli/releases/tag/v10.5.1

dominicfraser avatar Apr 03 '24 16:04 dominicfraser

Updating in Node in https://github.com/nodejs/node/pull/52351

dominicfraser avatar Apr 03 '24 16:04 dominicfraser