volta icon indicating copy to clipboard operation
volta copied to clipboard

When trying to install new versions volta hangs indefinitely and uses all available network bandwidth

Open effervescentia opened this issue 1 year ago • 8 comments

Websites will not load and I have not waited long enough for the command to ever resolve or error (I've waited over 10 minutes)

I was able to solve this temporarily for node downloads by configuring the hooks.json file with a mirror

{
    "node": {
        "index": {
            "template": "https://mirrors.dotsrc.org/nodejs/release/index.json"
        },
        "distro": {
            "template": "https://mirrors.dotsrc.org/nodejs/release/v{{version}}/{{filename}}"
        }
    }
}

this is also happening when trying to install new versions of yarn I first encountered this close to 2 or 3 weeks ago and it has not yet been resolved it doesn't looks like a new version of volta has been released, but perhaps some certificates have expired?

if I open Activity Monitor and quit volta-shim my website loads immediately

OS: MacOS 14.4.1

effervescentia avatar Apr 26 '24 13:04 effervescentia

as a follow up I wasn't able to find a relevant mirror for yarn but I followed this comment and used the releases from github as distro template

{
    "node": {
        "index": {
            "template": "https://mirrors.dotsrc.org/nodejs/release/index.json"
        },
        "distro": {
            "template": "https://mirrors.dotsrc.org/nodejs/release/v{{version}}/{{filename}}"
        }
    },
    "yarn": {
        "distro": {
            "template": "https://raw.githubusercontent.com/yarnpkg/berry/%40yarnpkg/cli/{{version}}/packages/yarnpkg-cli/bin/yarn.js"
        }
    }
}

effervescentia avatar Apr 26 '24 14:04 effervescentia

this happens to me as well!

xavidop avatar Apr 26 '24 14:04 xavidop

I am also experiencing this issue @Volta could you please help resolve

edison-vflow avatar Apr 26 '24 21:04 edison-vflow

This is extremely odd, thank you all for reporting! Is this still happening and consistent? And does it happen trying to access the paths directly via curl or the browser (e.g. https://nodejs.org/dist/index.json)?

charlespierce avatar May 13 '24 17:05 charlespierce

Happening to me too..
Trying to reach that url from the browser works for me.

gillyb avatar May 15 '24 10:05 gillyb

this is definitely still happening, and I don't have any issues curling that URL It's also happening with yarn releases, so nothing specific to the NodeJS download servers

effervescentia avatar May 15 '24 14:05 effervescentia

@charlespierce anything else I can do to help find the source of the issue?

effervescentia avatar May 21 '24 20:05 effervescentia

I'm seeing this too, using volta 1.1.1 on macOS 14.5, the verbose output is:

volta install node --verbose              
[verbose] No custom hooks found
[verbose] Found valid cache of Node version index
[verbose] Found newest LTS node version (20.14.0) from https://nodejs.org/dist/index.json
[verbose] Acquiring lock on Volta directory: /Users/me/.volta/volta.lock
[verbose] Downloading [email protected] from https://nodejs.org/dist/v20.14.0/node-v20.14.0-darwin-arm64.tar.gz

This is all that is output before the process hangs.

There doesn't seem to be excessive CPU usage:

...% top -stats pid,command,cpu | grep -i volta
26382  volta            0.0 
26382  volta            0.0 
29336  volta            0.0 
29336  volta            19.3
29336  volta            0.1 
29336  volta            0.0 
29336  volta            0.0 
29336  volta            0.0 

And network traffic is pretty constrained - nettop doesn't show any existing outbound connections

...% nettop | grep -i volta
                                                                                     interface         state        bytes_in       bytes_out   rx_dupe    rx_ooo     re-tx   rtt_avg   rcvsize    tx_win  tc_class    tc_mgt   cc_algo P C R W arch
volta.30865                                                                                                          675 KiB        1033 B      21 KiB     0 B       0 B
   tcp4 127.0.0.1:55871<->localhost:10011                                                  lo0   Established         675 KiB         656 B      21 KiB     0 B       0 B     1.22 ms   630 KiB   398 KiB        BE         -     cubic - - - -   so
   tcp4 127.0.0.1:55873<->localhost:10011                                                  lo0   Established           0 B           377 B       0 B       0 B       0 B     1.22 ms   398 KiB   398 KiB        BE         -     cubic - - - -   so

(but they could have come and gone before i got this snapshot)

I have the same behavior when used for installing as part of a node package.json:

{
 "volta": {
    "node": "20.14.0",
    "npm": "10.7.0"
  }
}

but here the hang eventually resolves with the error:

Volta error: Could not unpack Node v20.14.0
Please ensure the correct version is specified.
Error cause: failed to unpack `/Users/me/Library/Application Support/empirica/volta/tmp/.tmp170xQy/node-v20.14.0-darwin-arm64/lib/node_modules/npm/node_modules/read-package-json-fast/lib/index.js`

Error cause: failed to unpack `/Users/me/Library/Application Support/empirica/volta/tmp/.tmp170xQy/node-v20.14.0-darwin-arm64/lib/node_modules/npm/node_modules/read-package-json-fast/lib/index.js`

Error cause: failed to unpack `node-v20.14.0-darwin-arm64/lib/node_modules/npm/node_modules/read-package-json-fast/lib/index.js` into `/Users/me/Library/Application Support/empirica/volta/tmp/.tmp170xQy/node-v20.14.0-darwin-arm64/lib/node_modules/npm/node_modules/read-package-json-fast/lib/index.js`

Error cause: corrupt deflate stream
...

JamesPHoughton avatar Jun 18 '24 14:06 JamesPHoughton

still getting this behavior, bump

DecathectZero avatar Jul 03 '24 20:07 DecathectZero

@charlespierce is there anything we can do to help you look into this? It's pretty disruptive to our development since it hangs forever without an error message and kills network traffic

effervescentia avatar Jul 03 '24 21:07 effervescentia

Sorry, just saw this (have been busy on other stuff!). From the error messages @JamesPHoughton shared and the description of what people are seeing, it looks like that’s probably coming from some interaction between a bad download and attempting to unpack the result of that.

For others on the Volta team who are spelunking/debugging (or users who want to help), the relevant chunks of code are:

https://github.com/volta-cli/volta/blob/fd9cc3fcf4db08097709e3f2be207be12f2453f2/crates/volta-core/src/tool/node/fetch.rs#L180-L184

https://github.com/volta-cli/volta/blob/fd9cc3fcf4db08097709e3f2be207be12f2453f2/crates/volta-core/src/tool/node/fetch.rs#L67-L72

https://github.com/volta-cli/volta/blob/fd9cc3fcf4db08097709e3f2be207be12f2453f2/crates/volta-core/src/tool/node/fetch.rs#L104-L111

The relevant code paths haven’t been touched in 5 years (there are a couple Clippy fixes that are unreleased, but that’s it).

I suspect something is happening ~~upstream~~ in archive, given that Volta prints a debug message right before calling archive::fetch_native, you all see a very long timeout, and then the next thing that happens after that very long delay is attempting to unpack the resulting archive. Next step would be to go ~~see whether there is an upstream issue about it, and if so if there is a fix!~~ investigating our own code—I totally thought this was upstream!

https://github.com/volta-cli/volta/blob/fd9cc3fcf4db08097709e3f2be207be12f2453f2/crates/archive/src/tarball.rs#L37-L78

I will keep spelunking if and as I have time, but also welcome other folks digging in a bit.

chriskrycho avatar Jul 03 '24 22:07 chriskrycho

For anyone following on the thread, and per my edit (which you would not have gotten notified about), this is all in our own code; there is no upstream. 🤦🏻‍♂️

8vv9b5

A good next step might be for us to get a build up that folks can run to diagnose, with some additional tracing in Volta’s internal archive crate.

chriskrycho avatar Jul 03 '24 22:07 chriskrycho

Request for more info from folks on the thread who are hitting this: what operating system are you using? We have at least one confirmed Mac user, but I’d like to know for the rest as part of debugging!

chriskrycho avatar Jul 03 '24 23:07 chriskrycho

Request for more info from folks on the thread who are hitting this: what operating system are you using? We have at least one confirmed Mac user, but I’d like to know for the rest as part of debugging!

I can confirm that most (but not all) of the other folks that have been commenting / interacting with this PR are all on M1 / M2 MacBook Pros since they work with me at Voiceflow

effervescentia avatar Jul 06 '24 07:07 effervescentia

I have an M2 Pro MBP on Sonoma 14.5 and am experiencing this... Interestingly enough as early as a few weeks ago, my work MBP, a M1 Max also on Sonoma 14.5 wasn't having this issue

BitForger avatar Jul 06 '24 16:07 BitForger

Hey all, I just pushed up a branch and draft PR at #1784, which you should be able to check out and build, and then use locally. The testing workflow I would recommend:

git clone https://github.com/volta-cli/volta.git
cd volta
git switch chriskrycho/debug-1744
cargo build --release
mv $HOME/.volta/bin/volta $HOME/.volta/bin/volta-backup
cp ./target/release/volta $HOME/.volta/bin/volta
volta --verbose --very-verbose

That final line should print the normal help output preceded by this line:

[very verbose] log level: VeryVerbose

(If you have a non-standard installation location, you will want to figure out where it is with which volta and do the same basic thing for the final two steps, but with the appropriate directory changes.)

Once you have that set up, you should be able to get much more info about where in the archive/unpacking process things are going wrong. It still may not be enough, but it should at least be a start, and I don’t know what to try next until we get this much sorted.

chriskrycho avatar Jul 08 '24 22:07 chriskrycho

@chriskrycho here's the output with the debug version when installing yarn

volta --verbose --very-verbose install yarn
[very verbose] log level: VeryVerbose
[verbose] Loaded custom hooks file: /Users/benteichman/.volta/hooks.json
[verbose] Found [email protected] matching tag 'latest' from https://registry.npmjs.org/@yarnpkg/cli-dist
[verbose] Acquiring lock on Volta directory: /Users/benteichman/.volta/volta.lock
[verbose] Downloading [email protected] from https://registry.npmjs.org/@yarnpkg/cli-dist/-/cli-dist-4.3.1.tgz
Screenshot 2024-07-10 at 10 18 09

and here's the output from installing node

volta --verbose --very-verbose install [email protected]
[very verbose] log level: VeryVerbose
[verbose] Loaded custom hooks file: /Users/benteichman/.volta/hooks.json
[verbose] Acquiring lock on Volta directory: /Users/benteichman/.volta/volta.lock
[verbose] Downloading [email protected] from https://nodejs.org/dist/v20.14.0/node-v20.14.0-darwin-arm64.tar.gz
Screenshot 2024-07-10 at 10 27 34

EDIT: updated to include textual output from each command

effervescentia avatar Jul 10 '24 14:07 effervescentia

@effervescentia Thanks! Since it’s truncated, I assume that's where it hangs?

(Also, for reference, if you could copy and paste the actual text into a code snippet rather than screenshotting in cases like this—not just for Volta, but anywhere you’re leaving a bug report—that would be very preferable. As it happens, I am perfectly well-sighted, but if I happened to be temporarily or permanently blind, I would not be able to read those at all!)

chriskrycho avatar Jul 10 '24 17:07 chriskrycho

Also, for reference, if you could copy and paste the actual text into a code snippet rather than screenshotting in cases like this

absolutely, sorry bad habit 😅

effervescentia avatar Jul 10 '24 18:07 effervescentia

Since it’s truncated, I assume that's where it hangs?

Correct, it hangs there with no other output indefinitely

effervescentia avatar Jul 10 '24 18:07 effervescentia

Okay, I have done some more debugging locally and… I need to figure out what I need to change so that trace! prints from all crates, not just the core crate. The reason you’re not seeing further useful log info is because the way I set things up in #1784 only configured the root crate, not all the other crates within Volta’s source tree, so it’s only printing debug messages. I will update here once I have that sorted!

chriskrycho avatar Jul 12 '24 22:07 chriskrycho

All right, folks, I have pushed an update to #1784 which makes the tracing I added actually work. We had a nice little prefix check to filter even trace-level logs down to Volta-only tracing, and that was accidentally filtering out even the internal crates in question. Whoops! You should now be able to follow exactly the same steps as above, and see something more useful and informative for us!

chriskrycho avatar Jul 12 '24 23:07 chriskrycho

@chriskrycho here's the output from a node install

volta --verbose --very-verbose install [email protected]
[very verbose] log level: VeryVerbose
[verbose] Loaded custom hooks file: /Users/benteichman/.volta/hooks.json
[verbose] Acquiring lock on Volta directory: /Users/benteichman/.volta/volta.lock
[verbose] Downloading [email protected] from https://nodejs.org/dist/v20.13.0/node-v20.13.0-darwin-arm64.tar.gz
[very verbose] fetching tarball from https://nodejs.org/dist/v20.13.0/node-v20.13.0-darwin-arm64.tar.gz
[very verbose] detected compressed size of 41914632 for tarball

and from yarn

volta --verbose --very-verbose install yarn
[very verbose] log level: VeryVerbose
[verbose] Loaded custom hooks file: /Users/benteichman/.volta/hooks.json
[verbose] Found [email protected] matching tag 'latest' from https://registry.npmjs.org/@yarnpkg/cli-dist
[verbose] Acquiring lock on Volta directory: /Users/benteichman/.volta/volta.lock
[verbose] Downloading [email protected] from https://registry.npmjs.org/@yarnpkg/cli-dist/-/cli-dist-4.3.1.tgz
[very verbose] fetching tarball from https://registry.npmjs.org/@yarnpkg/cli-dist/-/cli-dist-4.3.1.tgz
[very verbose] detected compressed size of 988600 for tarball

effervescentia avatar Jul 17 '24 18:07 effervescentia

@effervescentia Thanks for sharing that!

@chriskrycho It looks like it's hanging on the fetch_uncompressed_size call, which makes a second HTTP request using the Range headers to fetch the specific block and the end of a gzip file that encode the uncompressed size of the file. We only use the uncompressed size of the file to make the progress bar for downloads more accurate (and we aren't able to get it ahead of time for Zip files on Windows at all), so that request definitely shouldn't be a long one. I wonder if it would make sense to set a fairly short timeout (on the order of 1 second perhaps?) for that 2nd request to make sure that one way or another it finishes quickly and doesn't block the rest of the process.

Alternatively, we could probably get away with skipping the 2nd request entirely and doing some estimation of the uncompressed size based on the compressed size.

A third option, which I haven't totally worked through but might be the easiest & cleanest solution: If we attach the ProgressRead to the underlying HTTP Reader, rather than the GzDecoder, then using the compressed size would be relatively accurate for the progress bar (modulo the actual decompression step for the last bits of data). This would take some testing to see how much that last step affects the accuracy, but hopefully it wouldn't be too much.

charlespierce avatar Jul 18 '24 06:07 charlespierce

@charlespierce @chriskrycho I can confirm from testing locally that setting a timeout on the request for the uncompressed size results in the download succeeding. Would you be willing to make this initial change in the short term at the cost of an inaccurate progress bar? This is an issue my team is still regularly running into and we would appreciate a fix that allows us to resume normal use of volta sooner if possible. Since the code is already built to handle the uncompressed size being None the risk seems pretty minimal

# crates/archive/src/tarball.rs @ line 137

.timeout(Duration::from_secs(1))

I've also tested one of the queries made with attohttpc as a curl request and it resolves without issue instantly, and I don't see any bugs related to this on that library's github page

curl https://nodejs.org/dist/v20.13.0/node-v20.13.0-darwin-arm64.tar.gz -H 'Range: bytes=41914628-41914631'

which I got by debugging the request made when running this volta command

volta --verbose --very-verbose install [email protected]

effervescentia avatar Jul 24 '24 05:07 effervescentia

@effervescentia let me see if I can get a more correct fix done in the next couple days, and if not then yeah, a workaround with a timeout seems totally reasonable. If nothing else, you should feel free to fork it and use the forked version in the meantime—not in a mean “if you don’t like it, fork it and maintain it yourself!” sense but in the sense that (a) our rate of change is pretty slow right now, and we haven’t changed that code in something like five years, so it should be relatively low risk and easy to keep updated; and (b) it should be fairly straightforward to flip back once we do get it fixed, hopefully soon.

chriskrycho avatar Jul 24 '24 22:07 chriskrycho

@charlespierce On the one hand, changing what we’re reading or just not even bothering with it (the same way we do for Windows) could “solve” this, but I’m worried that it could just leave us in a situation where we’re doing something and that it will actually just be that much harder to figure out if it happens even more rarely.

Things we absolutely do not understand at the moment:

  • What is actually causing the hang? It is obviously not happening to everyone, only to a subset of users, which implies (I think!) that it is not a logical bug.
  • Why is it happening to the users it is happening to (and, conversely, not to you or me or anyone else)?
  • Why is it using all available network bandwidth for at least @effervescentia’s folks—but not, or at least not obviously, @JamesPHoughton?

@effervescentia and others who are experiencing this, a few more follow-on questions:

  1. Are you behind a VPN when you see this, or is it happening on “regular” network traffic?

  2. You noted above that “most (but not all) of the other folks that have been commenting / interacting with this PR are all on M1 / M2 MacBook Pros”—what other hardware and operating systems are you seeing this on? (No one has yet reported anything but macOS 14.4 or 14.5.)

  3. If you capture both of the requests made by attohttpc::get—the initial one for the compressed size, and then the second one for the uncompressed size via range—and make them one after another in extremely close succession, e.g. with curl in a shell script, does that trigger this?

(The motivation for both (1) and (3) is to rule out some kind of upstream rate-limiting that is a result of hitting the same URLs in quick succession from a given IP address or something like that. The motivation for (2) is to work out whether it is possibly a bug in macOS 14.4+, since we had never had it reported prior to that.)

Finally, @effervescentia, you note that it’s taking all network bandwidth—do you have data to that effect e.g. via nettop, or just (!) the symptom of being unable to load other websites?


After staring at the implementation and thinking about the unanswered questions I posted at the top, I am actually inclined to ship a timeout but also warn! or possibly info! on the console in the case that it happens. I will open a PR doing just that tomorrow.

chriskrycho avatar Jul 24 '24 23:07 chriskrycho

One slight complication for testing the network setup is that we actually have 2 requests open at the same time: The "actual" download request is started, then we read the response headers to get the total length of the output. Then, with that request still open (we haven't started reading the raw data), we make another request to the same URL, but with the Range header added. We read that request in full, and only then do we actually start streaming the full data.

I'm also curious about the VPN setup, but I suspect there's something with those two requests for the same resource that is part of the root cause.

charlespierce avatar Jul 25 '24 03:07 charlespierce

Yeah, that is my guess as well, but since it is not causing a problem for most users I suspect that is only part of the story. One thing we could do is reproduce just exactly that behavior in an isolated binary, enable maximum tracing with an of the shelf logger, and see if folks who are seeing this can reproduce with that. (I will put up a version of this in a bit if I get a chance.)

chriskrycho avatar Jul 25 '24 12:07 chriskrycho

Here’s what I think is an actual minimal reproduction of all the pieces we do in Volta for this particular set of network calls: https://github.com/chriskrycho/volta-networking-repro

The instructions there should be enough for people to see if that is sufficient to reproduce this issue. 🤞🏼 At a later point, I will also put up a build that works around this as described.

One important note: this replicates what we have on main right now, not what is on 1.1, but I do not think the few small changes should be relevant here. (If they are, we can easily make a version that matches more exactly.)

chriskrycho avatar Jul 25 '24 14:07 chriskrycho