cli
cli copied to clipboard
[BUG] Npm opens many connections when installing
Is there an existing issue for this?
- [X] I have searched the existing issues
This issue exists in the latest npm version
- [X] I am using the latest npm
Current Behavior
My project needs around 2000 packages. When running "npm install" it starts out by opening 2000 connections and then when all the connections are open it downloads 1 package on each connection.
Is this intended?, because it causes a quite high load on our package proxy. Is there a way to limit the number of connections?
With npm 9.9 it only uses 18/20 connections.
Expected Behavior
"npm install" only opens a few connetions to the package server.
Steps To Reproduce
- Clear cache
- npm install or npm ci
Environment
- npm: 10.2.5
- Node.js: 18.19.0
- OS Name: Windows 11
- System Model Name:
- npm config:
registry = "http://fosspackages/npm/default-npm/"
save-exact = true
save-prefix = ""
; node bin location = C:\Users\lj\.nvm\versions\node\v18.19.0\bin\node.exe
; node version = v18.19.0
; npm local prefix = C:\work\Dexter\web-client
; npm version = 10.2.5
; cwd = C:\work\Dexter\web-client
; HOME = C:\Users\lj
; Run `npm config ls -l` to show all defaults.
I noticed that there is a config option called maxsockets. But the default is 15 and
npm config get maxsockets
also returns 15.
So if this should limit the number of connections it doesn't seem to work in npm 10.x.
@Larsjep Just curious what npm config get agent
shows. Supposedly if that's set, maxsockets
will be ignored in npm-registry-fetch
.
Could also be that's something that's buried in NPM and is somehow overridden in some other way, but it caught my eye when I searched the org for maxsockets
.... might also help narrow down the problem at least.
@MartinFalatic npm config get agent
returns undefined
I have also looked in the code base, but I can't find anywhere it refers to the maxsockets
settings.
I can find it in the 9.x codebase, but the download part seem to radically changed in 10.x
+1 here.
We are using self-hosted runners(ARC) on our own cluster and when using npm 10 we noticed a lot of dropped packets due to NAT ports exhaustion all coming from the arc runner executing npm install with npm 10. We have dynamic port allocation, which means that npm creates a lot of connections in a burst at the same time before the new ports can be allocated to the worker node running the runner pod.
Npm 9 does not seem to have this issue. Looks an issue with the new @npmcli/agent
http client introduced in npm 10.
What is also interesting is that the workflow does not fail. npm seems to use whatever number of connections it managed to establish to install all packages.
Hello there,
I would like to add that this is actually a bigger issue for very large projects (3000+ dependencies):
- CPU and RAM usage is a lot higher than with npm 9
- It is a lot slower than npm 9
-
npm install
randomly fails with ECONNRESET and ETIMEDOUT errors (even more consistently on Windows), and seems to ignore most of .npmrc options (maxsockets, fetch-retries, etc)
We're affected by the same issue... Our Jenkins CI is constantly failing to install private dependencies from GitHub (through our Proxy).
Two potential workarounds are disabling the proxy (can't do that in corporate environment) or downgrading to npm 9
It seems the maxsockets
option (default value 15) has no effect at all in npm@10
, at least in situations where there is no proxy involved.
When I run a npm install
with around 1k dependencies inside a local VMWare VM, I'm consistently hitting a bug in VMWare where npm exhausts the CPU resources of vmnet-natd
, which leads to npm hanging for minutes and finally terminating with random network errors. Downgrading to npm@9
completely resolves it.
+1 here My company is having to hold back NPM 10 because of this issue when connecting to a private self-hosted registry. To help with our decision making, is there any indication when this might be triaged?
Same issue encountered on our vm lima. After upgrading to node v20 and npm v10, npm install freezes the entire vm. Downgrading to v9 fixed the issue.
We are seeing about half of our npm ci
runs with npm version 10 on github actions fail; downgrading to npm version 9 speeds up install by about 45 seconds (2m45 -> 2m) and doesn't seem to fail at all.
I can reproduce easily so please let me know if I can help debug this somehow
Potentially being looked at under #7272.
My gitlab ci/cd is also very unstable due to this behavior. Often causing time outs due to the large amount of connections. Especially since I have some jobs running in parallel as well.
So... I went into the rabbit hole... It went very very deep... But I finally found the root cause!
-
npm-reqistry-fetch
usesmake-fetch-happen
, see here. -
make-fetch-happen
usesminipass-fetch
, see here. -
minipass-fetch
uses Node's builtinhttp
/https
, see here. - The
maxSockets
parameter gets passed through all these layers just fine, no issue here. - However, the implementation of
http.Agent
in Node.js is suboptimal: It checks formaxSockets
here, but does not immediately increase the checked value. Instead it callsthis.createSocket
and assumes the function pushes tothis.sockets
synchronously. Typical risk for a race condition. - Now back up the stack again:
make-fetch-happen
uses@npmcli/agent
, see here -
@npmcli/agent
usesagent-base
, see here -
agent-base
subclasseshttp.Agent
and overwritescreateSocket
with an implementation that is asynchronous, see here. - 🫠
* However, the implementation of `http.Agent` in Node.js is suboptimal: It checks for `maxSockets` [here](https://github.com/nodejs/node/blob/fec7e505fc6f256522eb0031515757f2e58daa61/lib/_http_agent.js#L284), but does not immediately increase the checked value. Instead it calls `this.createSocket` and assumes the function pushes to `this.sockets` **synchronously**. Typical risk for a race condition.
I didn't look very long at the code.
But it seems that this.freeSockets[name].length
+ freeSockets.length
is used to determine the current length of sockets validated again the this.maxSockets
option.
This array is increased via: ArrayPrototypePush(this.sockets[name], s);
a bit down in the code.
Then there is also something like "max socket count" as well. Which has an easier implementation, the code uses: this.totalSocketCount
to keep track of that value, which is validated against the this.maxTotalSocket
option.
Anyway there is a lot of complex state keeping going on in the _http_agent.js file. Keep in mind that http_agent.js file is using a local createSocket
method, defined below.
I recall similar issue https://github.com/npm/make-fetch-happen/issues/59, but in that case it was only for proxied connections, while the current issue affects definitely affects non-proxied connections.
Here is a workaround I just applied to our CI system. I am using the request as a placeholder in the this.sockets
array.
agent-base.patch
--- a/dist/index.js
+++ b/dist/index.js
@@ -65,6 +65,8 @@
l.indexOf('node:https:') !== -1);
}
createSocket(req, options, cb) {
+ const sockets = this.sockets[this.getName(options)];
+ sockets.push(req);
const connectOpts = {
...options,
secureEndpoint: this.isSecureEndpoint(options),
\ No newline at end of file
@@ -77,6 +79,7 @@
return socket.addRequest(req, connectOpts);
}
this[INTERNAL].currentSocket = socket;
+ sockets.splice(sockets.indexOf(req), 1);
// @ts-expect-error `createSocket()` isn't defined in `@types/node`
super.createSocket(req, options, cb);
}, cb);
\ No newline at end of file
We're using Earthly, so I added this to our Earthfile:
# patch for https://github.com/npm/cli/issues/7072
COPY agent-base.patch ./
RUN patch -p 1 -d /usr/local/lib/node_modules/npm/node_modules/agent-base/ < agent-base.patch
What is the status of this issue? In practice npm, especially in CI systems, is currently creating dos attacks to repository services. In universal repository services such as Artifactory, this affects to all repositories because the service is busy serving floods from npm clients.
I see issue #7272 has been closed as duplicate. Not sure how labels work in this project, but I see #7272 has "Priority 1" label, and if that makes any difference and adds priority, could you please add that label in this issue?
@lukekarrys. Cuz #7272 is closed now (as a duplicate of the current issue), could you please take a look at this?
I believe this a big issue in NPM for both individuals, small and large companies alike. Please, consider increasing the priority of this ticket (eg. label as as prio 1: https://github.com/npm/cli/labels/Priority%201 or higher).
Luke has created an issue in the upstream package https://github.com/TooTallNate/proxy-agents/issues/299. If this can't be fixed quickly we may also look into mitigating it somehow from the npm codebase.
Luke has created an issue in the upstream package TooTallNate/proxy-agents#299. If this can't be fixed quickly we may also look into mitigating it somehow from the npm codebase.
You could try https://github.com/delvedor/hpagent as a replacement, the only feature that one lacks is Windows PAC support, so I guess on Windows you still want to use proxy-agents
.
I think I've landed on a patch for agent-base
that will fix this behavior. If no other issues are found I'm planning to release it as part of the next npm
release on 2024-04-03
.
I think I've landed on a patch for
agent-base
that will fix this behavior. If no other issues are found I'm planning to release it as part of the nextnpm
release on2024-04-03
.
I see: https://github.com/TooTallNate/proxy-agents/pull/302/files#diff-bd534a5bdb8155fe954a536361c5ae2e59247b5ca942066ea365d4aab955526e
Thanks!
And thank you to @neelance whose initial investigation saved a me lot of debugging time.
I can see you have a PR up to bring in this change to this repo @lukekarrys 🙌 Thank you!
Once merged into NPM, do you have any idea how long a change like this normally takes to be released alongside Node as a Node/npm pair in the Node docker images?
https://nodejs.org/dist/index.json
Once merged into NPM, do you have any idea how long a change like this normally takes to be released alongside Node as a Node/npm pair in the Node docker images?
It usually takes 2-4 weeks. In the most recent case, [email protected]
was released on 2024-02-28
and first included in Node v20.12.0 which was released on 2024-03-26
.
If you need an npm version sooner than that, I recommend having a step in your Docker images to pin/update to a newer npm version using npm install npm@<VERSION|RANGE> -g
.
Thank you for the help ❤️
Indeed. Thank you so much! ❤️ Appreciate the prio shift, I know you are very busy.
Released in https://github.com/npm/cli/releases/tag/v10.5.1
Updating in Node in https://github.com/nodejs/node/pull/52351