zebra icon indicating copy to clipboard operation
zebra copied to clipboard

Fix parameter download hangs with Rust 1.64

Open teor2345 opened this issue 3 years ago • 15 comments

Scheduling

This is only a release blocker if it is a miscompile or denial of service. If we find out it is something else, we can fix it after the release.

Motivation

Rust 1.64 builds hang when downloading Zcash parameters, in both GitHub Actions and Docker:

  • https://github.com/ZcashFoundation/zebra/actions/runs/3123995445/jobs/5067004540
  • https://github.com/ZcashFoundation/zebra/actions/runs/3108491580/jobs/5037748856

Background

In PR #5090 we disabled beta Rust tests, because multiple bugs happened:

  • disk filled up during builds with beta Rust
  • the beta Rust (1.64) builds hung when downloading Zcash parameters

Rust was not fixed before the Rust stable release on 23 September 2022, so now we are using Rust 1.63 to download Zcash parameters in PR #5251.

Tasks

  • [x] #5252
  • [ ] Optional: Open a rust-lang/rust ticket to ask for a bug fix
  • [ ] Copy the zcash_proofs download code into Zebra, and use it instead
  • [ ] Make GitHub Actions builds use Rust stable in:
    • https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.yml
    • https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.patch.yml
  • [ ] Make Docker builds use Rust stable in:
    • https://github.com/ZcashFoundation/zebra/blob/main/docker/Dockerfile
    • https://github.com/ZcashFoundation/zebra/blob/main/docker/zcash-params/Dockerfile
  • [ ] Re-enable beta Rust on Ubuntu in:
    • https://github.com/ZcashFoundation/zebra/blob/main/.github/workflows/continous-integration-os.yml
  • [ ] Ask a GitHub admin to update the main branch protection rules in https://github.com/ZcashFoundation/zebra/settings/branches

Open tickets for:

  • [ ] Wait for https://github.com/zcash/librustzcash/pull/666 to merge, then wait for a new librustzcash release, update the Zebra dependency, and delete the copied zcash_proofs download code

teor2345 avatar Sep 06 '22 06:09 teor2345

It turns out this was a GitHub cache issue, not a Rust issue.

teor2345 avatar Sep 11 '22 22:09 teor2345

We need to do this because the tests still failed after a few days, so we disabled them.

teor2345 avatar Sep 12 '22 22:09 teor2345

When I run zebrad download locally, I get these logs:

$ mv ~/.zcash-params ~/.zcash-params-old                                                                                                                                    
$ cargo run download
    Finished dev [optimized + debuginfo] target(s) in 0.49s
     Running `target/debug/zebrad download`
2022-09-26T02:21:28.423433Z  INFO zebrad::components::tracing::component: started tracing component filter="info" TRACING_STATIC_MAX_LEVEL=LevelFilter::TRACE LOG_STATIC_MAX_LEVEL=Trace
2022-09-26T02:21:28.423473Z  INFO {zebrad="2465f4b" net="Main"}: zebrad::application: initialized rayon thread pool for CPU-bound tasks num_threads=36
2022-09-26T02:21:28.423490Z  INFO {zebrad="2465f4b" net="Main"}: zebrad::commands::download: checking if Zcash Sapling and Sprout parameters have been downloaded
2022-09-26T02:21:28.423536Z  INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: downloading Zcash Sapling parameters
2022-09-26T02:35:30.407524Z  INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: downloading Zcash Sprout parameters
2022-09-26T03:35:30.410102Z  INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: error downloading Zcash Sprout parameters, retrying error=IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual:   211288196 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" })
2022-09-26T04:35:30.413594Z  INFO {zebrad="2465f4b" net="Main"}: zebra_consensus::primitives::groth16::params: error downloading Zcash Sprout parameters, retrying error=IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual:   219973083 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" })
The application panicked (crashed).
Message:  error downloading Sprout parameter files after 3 retries. IoError(Custom { kind: Other, error: "sprout-groth16.params failed reading:\nexpected: 725523612 bytes,\nactual:   215938783 bytes from \"https://download.z.cash/downloads/sprout-groth16.params.part.1 + https://download.z.cash/downloads/sprout-groth16.params.part.2\",\nerror: Custom { kind: Other, error: \"download response failed: IoError(Custom { kind: TimedOut, error: \\\"the timeout of the request was reached\\\" })\" }" }) Hint: try deleting "/home/dev/.zcash-params", then running 'zebrad download' to re-download the parameters
Location: zebra-consensus/src/primitives/groth16/params.rs:188

Metadata:
version: 1.0.0-beta.15+14.g2465f4b
Zcash network: Mainnet
state version: 25
branch: main
git commit: 2465f4b
commit timestamp: 2022-09-22T20:22:22Z
target triple: x86_64-unknown-linux-gnu
build profile: debug

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

   0: zebrad::application:: with zebrad="2465f4b" net="Main"
      at zebrad/src/application.rs:411

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  <empty backtrace>

teor2345 avatar Sep 26 '22 06:09 teor2345

When I've previously downloaded the parameters, everything works:

$ cargo run download
    Finished dev [optimized + debuginfo] target(s) in 35.59s
     Running `target/debug/zebrad download`
2022-09-26T06:32:12.055291Z  INFO zebrad::components::tracing::component: started tracing component filter="info" TRACING_STATIC_MAX_LEVEL=LevelFilter::TRACE LOG_STATIC_MAX_LEVEL=Trace
2022-09-26T06:32:12.055332Z  INFO {zebrad="eefeeee" net="Main"}: zebrad::application: initialized rayon thread pool for CPU-bound tasks num_threads=36
2022-09-26T06:32:12.055350Z  INFO {zebrad="eefeeee" net="Main"}: zebrad::commands::download: checking if Zcash Sapling and Sprout parameters have been downloaded
2022-09-26T06:32:12.055373Z  INFO {zebrad="eefeeee" net="Main"}: zebra_consensus::primitives::groth16::params: checking and loading Zcash Sapling and Sprout parameters

teor2345 avatar Sep 26 '22 06:09 teor2345

This is only a release blocker if it is a miscompile, denial of service, or another kind of security issue. If we find out it is something else, we can fix it after the release.

teor2345 avatar Sep 28 '22 23:09 teor2345

minreq HTTP library: https://github.com/neonmoe/minreq

Download code that could be very slow: https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49

Dev build optimisation for these crates: https://github.com/ZcashFoundation/zebra/blob/main/Cargo.toml#L48

teor2345 avatar Sep 29 '22 22:09 teor2345

I handed this ticket over to Marek, I'm happy to help out once I've finished the lightwalletd tests.

teor2345 avatar Oct 02 '22 19:10 teor2345

Here's what I diagnosed on my local machine:

  • The downloading doesn't actually hang.
  • When I'm on Rust 1.64:
    • The download speed is between 10KiB/s and 100KiB/s.
    • I was able to download Sapling params since they're under 50MB.
    • I didn't bother to download Sprout params with this speed.
  • When I'm on Rust 1.63:
    • The download speed is around 2MiB/s.
  • The process takes 100% of a single core when downloading the params in all cases. htop shows that about half of the work is done by the kernel for Rust 1.63. This is not the case for Rust 1.64 since all work is green.

I think the downloading is still slow even on Rust 1.63 since I can download the params with ~50MiB/s in Firefox.

I think the most likely reason is https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49. I'm trying to find a better way around it.

I haven't found the reason why the speed differs between Rust 1.63 and 1.64.

upbqdn avatar Oct 03 '22 17:10 upbqdn

I haven't found the reason why the speed differs between Rust 1.63 and 1.64.

Possibly Linux system call changes, changes to Vec optimisation, or an accidental performance regression: https://github.com/rust-lang/rust/blob/stable/RELEASES.md#compatibility-notes

teor2345 avatar Oct 03 '22 22:10 teor2345

I think the most likely reason is https://github.com/zcash/librustzcash/pull/459/files#diff-7a58c94ff8576ab8b04da546cc73edc2ffa67cb13d1f050ed80bc363ca479193R49. I'm trying to find a better way around it.

We could try following the minreq example code exactly? https://docs.rs/minreq/2.6.0/minreq/struct.ResponseLazy.html#example

Or we could try using a BufReader, or something like Iterator::array_chunks() - which we'd have to copy into the code, because it is only on nightly right now: https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.array_chunks

teor2345 avatar Oct 03 '22 22:10 teor2345

  • When I'm on Rust 1.64:

    • The download speed is between 10KiB/s and 100KiB/s.
    • I was able to download Sapling params since they're under 50MB.
    • I didn't bother to download Sprout params with this speed.
  • When I'm on Rust 1.63:

    • The download speed is around 2MiB/s.

@mpguerra this isn't a security issue, so I don't think it is a release blocker.

Our current solution (using Rust 1.63) has ~~exactly the same performance as zcashd, because they use the same code.~~

Edit: actually performance is much worse, because zcashd uses a download script.

I'm still not sure it's a release blocker though, because the download only happens once, and it's done by the time it's needed.

teor2345 avatar Oct 03 '22 22:10 teor2345

because they use the same code.

zcashd does not use the Rust code to download the parameters; we use a Bash script. The code might be used by the mobile SDKs, I'd have to check.

str4d avatar Oct 03 '22 22:10 str4d

I opened https://github.com/zcash/librustzcash/pull/666.

upbqdn avatar Oct 05 '22 20:10 upbqdn

I updated the list of tasks in the ticket to include a librustzcash update.

teor2345 avatar Oct 05 '22 22:10 teor2345

Since it seems like we have workarounds and it's not a release blocker, I'm going to unschedule this issue from this sprint now. We can re-schedule for a future sprint later.

mpguerra avatar Oct 06 '22 09:10 mpguerra

We split out the hangs into another ticket, but the CI fixes still need to be done.

teor2345 avatar Oct 26 '22 22:10 teor2345