package-feeds
package-feeds copied to clipboard
Large NPM package response data causes timeout
After fixes for #139, remaining packages that still cause timeouts all have very large response data sizes (e.g. 6MB)
Go appears to enable compression (gzip) by default on requests.
Looking into this further, NPM supports HTTP2. I suspect that there is some weird behavior w.r.t timeouts and HTTP2 multiplexing.
Inspecting the network traffic, there is only 1 connection being opened to https://registry.npmjs.org/
, so yes, this is multiplexing queries over a single TCP+TLS connection to NPM.
This means that the large repos are congesting the single multiplexed HTTP2 connection. While the aggregate of the responses will be received faster than individual HTTP1 connections, each individual response is slower than if it was not multiplexed.
Some ideas for improving the performance:
- remove the limited workers, and just fire all the requests concurrently.
- add a small sleep (e.g. 0.1s) between each request to give the server some space.
- increase the timeout significantly (e.g. 1m30s) to allow for the slower responses.
Thanks for investigating this @calebbrown!
We still have errors here. Another attempt to reduce this will be to add an LRU cache and ETag "If-None-Match" checking on requets to NPM for a given package.