zebra
zebra copied to clipboard
Fix parameter download hangs with Rust 1.64
Scheduling
This is only a release blocker if it is a miscompile or denial of service. If we find out it is something else, we can fix it after the release.
Motivation
Rust 1.64 builds hang when downloading Zcash parameters, in both GitHub Actions and Docker:
- https://github.com/ZcashFoundation/zebra/actions/runs/3123995445/jobs/5067004540
- https://github.com/ZcashFoundation/zebra/actions/runs/3108491580/jobs/5037748856
Background
In PR #5090 we disabled beta Rust tests, because multiple bugs happened:
- disk filled up during builds with beta Rust
- the beta Rust (1.64) builds hung when downloading Zcash parameters
Rust was not fixed before the Rust stable release on 23 September 2022, so now we are using Rust 1.63 to download Zcash parameters in PR #5251.
Tasks
- [x] #5252
- [ ] Optional: Open a rust-lang/rust ticket to ask for a bug fix
- [ ] Copy the
zcash_proofsdownload code into Zebra, and use it instead - [ ] Make GitHub Actions builds use Rust stable in:
- https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.yml
- https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.patch.yml
- [ ] Make Docker builds use Rust stable in:
- https://github.com/ZcashFoundation/zebra/blob/main/docker/Dockerfile
- https://github.com/ZcashFoundation/zebra/blob/main/docker/zcash-params/Dockerfile
- [ ] Re-enable beta Rust on Ubuntu in:
- https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.yml
- [ ] Ask a GitHub admin to update the
mainbranch protection rules in https://github.com/ZcashFoundation/zebra/settings/branches
Open tickets for:
- [ ] Wait for https://github.com/zcash/librustzcash/pull/666 to merge, then wait for a new
librustzcashrelease, update the Zebra dependency, and delete the copiedzcash_proofsdownload code
It turns out this was a GitHub cache issue, not a Rust issue.
We need to do this because the tests still failed after a few days, so we disabled them.
When I run zebrad download locally, I get these logs:
$ mv ~/.zcash-params ~/.zcash-params-old
$ cargo run download
Finished dev [optimized + debuginfo] target(s) in 0.49s
Running `target/debug/zebrad download`
2022-09-26T02:21:28.423433Z INFO zebrad::components::tracing::component: started tracing component filter="info" TRACING_STATIC_MAX_LEVEL=LevelFilter::TRACE LOG_STATIC_MAX_LEVEL=Trace
2022-09-26T02:21:28.423473Z INFO {zebrad="2465f4b" net="Main"}: zebrad::application: initialized rayon thread pool for CPU-bound tasks num_threads=36
2022-09-26T02:21:28.423490Z INFO {zebrad="2465f4b" net="Main"}: zebrad::commands::download: checking if Zcash Sapling and Sprout parameters have been downloaded
2022-09-26T02:21:28.423536Z INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: downloading Zcash Sapling parameters
2022-09-26T02:35:30.407524Z INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: downloading Zcash Sprout parameters
2022-09-26T03:35:30.410102Z INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: error downloading Zcash Sprout parameters, retrying error=IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual: 211288196 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" })
2022-09-26T04:35:30.413594Z INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: error downloading Zcash Sprout parameters, retrying error=IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual: 219973083 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" })
The application panicked (crashed).
Message: error downloading Sprout parameter files after 3 retries. IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual: 215938783 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" }) Hint: try deleting "/home/dev/.zcash-params", then running 'zebrad download' to re-download the parameters
Location: zebra-consensus/src/primitives/groth16/params.rs:188
Metadata:
version: 1.0.0-beta.15+14.g2465f4b
Zcash network: Mainnet
state version: 25
branch: main
git commit: 2465f4b
commit timestamp: 2022-09-22T20:22:22Z
target triple: x86_64-unknown-linux-gnu
build profile: debug
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
0: zebrad::application:: with zebrad="2465f4b" net="Main"
at zebrad/src/application.rs:411
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
<empty backtrace>
When I've previously downloaded the parameters, everything works:
$ cargo run download
Finished dev [optimized + debuginfo] target(s) in 35.59s
Running `target/debug/zebrad download`
2022-09-26T06:32:12.055291Z INFO zebrad::components::tracing::component: started tracing component filter="info" TRACING_STATIC_MAX_LEVEL=LevelFilter::TRACE LOG_STATIC_MAX_LEVEL=Trace
2022-09-26T06:32:12.055332Z INFO {zebrad="eefeeee" net="Main"}: zebrad::application: initialized rayon thread pool for CPU-bound tasks num_threads=36
2022-09-26T06:32:12.055350Z INFO {zebrad="eefeeee" net="Main"}: zebrad::commands::download: checking if Zcash Sapling and Sprout parameters have been downloaded
2022-09-26T06:32:12.055373Z INFO {zebrad="eefeeee" net="Main"}: zebra_consensus::primitives::groth16::params: checking and loading Zcash Sapling and Sprout parameters
This is only a release blocker if it is a miscompile, denial of service, or another kind of security issue. If we find out it is something else, we can fix it after the release.
minreq HTTP library: https://github.com/neonmoe/minreq
Download code that could be very slow: https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49
Dev build optimisation for these crates: https://github.com/ZcashFoundation/zebra/blob/main/Cargo.toml#L48
I handed this ticket over to Marek, I'm happy to help out once I've finished the lightwalletd tests.
Here's what I diagnosed on my local machine:
- The downloading doesn't actually hang.
- When I'm on Rust 1.64:
- The download speed is between 10KiB/s and 100KiB/s.
- I was able to download Sapling params since they're under 50MB.
- I didn't bother to download Sprout params with this speed.
- When I'm on Rust 1.63:
- The download speed is around 2MiB/s.
- The process takes 100% of a single core when downloading the params in all cases.
htopshows that about half of the work is done by the kernel for Rust 1.63. This is not the case for Rust 1.64 since all work is green.
I think the downloading is still slow even on Rust 1.63 since I can download the params with ~50MiB/s in Firefox.
I think the most likely reason is https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49. I'm trying to find a better way around it.
I haven't found the reason why the speed differs between Rust 1.63 and 1.64.
I haven't found the reason why the speed differs between Rust 1.63 and 1.64.
Possibly Linux system call changes, changes to Vec optimisation, or an accidental performance regression:
https://github.com/rust-lang/rust/blob/stable/RELEASES.md#compatibility-notes
I think the most likely reason is https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49. I'm trying to find a better way around it.
We could try following the minreq example code exactly?
https://docs.rs/minreq/2.6.0/minreq/struct.ResponseLazy.html#example
Or we could try using a BufReader, or something like Iterator::array_chunks() - which we'd have to copy into the code, because it is only on nightly right now:
https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.array_chunks
When I'm on Rust 1.64:
- The download speed is between 10KiB/s and 100KiB/s.
- I was able to download Sapling params since they're under 50MB.
- I didn't bother to download Sprout params with this speed.
When I'm on Rust 1.63:
- The download speed is around 2MiB/s.
@mpguerra this isn't a security issue, so I don't think it is a release blocker.
Our current solution (using Rust 1.63) has ~~exactly the same performance as zcashd, because they use the same code.~~
Edit: actually performance is much worse, because zcashd uses a download script.
I'm still not sure it's a release blocker though, because the download only happens once, and it's done by the time it's needed.
because they use the same code.
zcashd does not use the Rust code to download the parameters; we use a Bash script. The code might be used by the mobile SDKs, I'd have to check.
I opened https://github.com/zcash/librustzcash/pull/666.
I updated the list of tasks in the ticket to include a librustzcash update.
Since it seems like we have workarounds and it's not a release blocker, I'm going to unschedule this issue from this sprint now. We can re-schedule for a future sprint later.
We split out the hangs into another ticket, but the CI fixes still need to be done.