bdk ci: move to Nix (simpler, easy to maintain version)

Description

This is a simpler version of #1257. Although not fully reproducible (we are fetching Cargo dependencies instead of declaring all of them). It is WAY easier to maintain and provides most of the benefits and Dev-Experience from Nix.

You can switch flawlessly from Rust stable, MSRV, and nightly using either:

nix develop .: latest Rust stable
nix develop .#stable: latest Rust stable
nix develop .#msrv: Rust MRSV
nix develop .#nightly: Rust Nightly

It handles all the annoyances of Cargo.locks and cargo updates.

Also you don't need to worry about installing binaries from bitcoind or electrs crates, since we fetch these from nixpkgs (with pinned versions) and add them to your local environment. Additionally, all the extra dependencies that you need for WASM are handled as well. Finally, no need to set ENV vars: they are also set all by default to you.

If you are committing typos, or failing to adhere to our guidelines in CONTRIBUTING.md; we won't let you commit or push. This is handled by Nix's automated pre-commit hooks which gives helpful error messages.

Please check the Nix.md file to learn more about Nix and what we are aiming to do with the Nix integration into BDK.

I am taking the liberty of tagging certain people that are Nix enthusiast to help the discussion and review. Cc @notmandatory, @dpc, @casey-bowman, @plebhash, @matthiasdebernardini, @gudnuf, @oleonardolima.

Notes to the reviewers

Pre-commit checks for everything that we enforce:
- Checks for typos in the source code and documentation.
- Checks if all the commits are GPG-signed and follow the conventional commits style. (again, please check CONTRIBUTING.md)
- Runs cargo clippy in all workspace.
- Runs cargo fmt in all workspace.
- Runs nixpgs-fmt in all workspace (for .nix files)
Instructions and rationale in Nix.md

Closes #1162. Superseds #1122, #1156, #1165, #1257.

Pinneds Dependencies:

bitcoind: pinned to 0.27.0 in nixpkgs version 24.05.
electrs: pinned to Blockstream's esplora.

Changelog notice

We moved to Nix-based CI and Local development environment to better onboard new developers and enhance the productivity of developers.

Checklists

All Submissions:

[x] I've signed all my commits
[x] I followed the contribution guidelines
[x] I ran cargo fmt and cargo clippy before committing

New Features:

[x] I've added tests for the new feature
[x] I've added docs for the new feature

Bugfixes:

[ ] This pull request breaks the existing API
[x] I've added tests to reproduce the issue which are now passing
[x] I'm linking the issue being fixed by this PR

Feb 04 '24 14:02 storopoli

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Feb 04 '24 19:02 dpc

So just the dev shells, no crane etc.? Seems reasonable. Dev shells are like 80% of benefits for very little Nix involvment.

Feb 04 '24 19:02 dpc

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf324aa5d5c70ce0eaa46a8c27f16d7c4a2

bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");

Feb 04 '24 19:02 storopoli

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf
bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

Feb 04 '24 19:02 evanlinjin

This can be fixed with -- --test-threads=2. Shall we add?

Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?

Good call I did this in a1997cf
bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");
64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

4 is the default. So no change then? I can easily delete a1997cf324aa5d5c70ce0eaa46a8c27f16d7c4a2 (that's why I did it in a separate commit 😄)

Feb 04 '24 19:02 storopoli

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.

Feb 04 '24 23:02 dpc

64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).

Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.

I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?

Feb 05 '24 03:02 evanlinjin

I was on my phone before, so let me elaborate.

I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?

From a perspective of most workloads e.g. asking for blockchain data, bitcoind (and electrs) acts very much like database software, e.g. posgresql. Postgresql is an interesting case because it utilizes process-based model - meaning it will spawn not just a thread, but a whole process for every client connection - waaay heavier than just a thread.

Even for that heavy model, the recommendations will be to limit it around around 500.

It takes many threads to saturate modern IO device. Just an example from a yt video I had on hand benchamarking filesystems. Each bar is number of workers. And these are dedicated IO-generating workers, generating heavy load in a tight loop. So in this benchmark, it will take like 4 workers. Possibly one worker is even more threads. But an rpc thread in bitcoind by necessity will be spending lots of time reading things and writing stuff to a tcp connection, doing some checks, re-encodings and what not. To actually get a maximum IO performance there will need to be a lot of them. I'd say 64 per core.

On my mostly idle laptop at this very moment there are 2k threads running:

On a modern Linux a thread is just a stack, which due to CoW optimizations will really amount to 1 or 2 pages (4K page size) of physically used memory plus a tiny (1K?) amount to track it all in the kernel.

There's really not much to save on thread counts in a grand scheme of things.

Personally, I like to run as lots and lots of tests in parallel, especially any sort of integration and e2e tests, which tend to have lots of dead time - waiting for stuff (like disk and network IO - latency-bound things). The usual limit to this is the available memory. Additional benefit is also that it tends to expose problems more often, as all operations tend to have more ... timing jitter.

So I'm really surprised that the default in bitcoind is 4. That's too little even for an old generation raspberry pi.

Feb 05 '24 04:02 dpc

Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.

Also I am having 8 tests fail, is that just me?

running 8 tests
test mempool_re_emits_if_tx_introduction_height_not_reached ... FAILED
test mempool_avoids_re_emission ... FAILED
test no_agreement_point ... FAILED
test mempool_during_reorg ... FAILED
test tx_can_become_unconfirmed_after_reorg ... FAILED
test test_sync_local_chain ... FAILED
test ensure_block_emitted_after_reorg_is_at_reorg_height ... FAILED
test test_into_tx_graph ... FAILED

failures:

---- mempool_re_emits_if_tx_introduction_height_not_reached stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- mempool_avoids_re_emission stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- no_agreement_point stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- mempool_during_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- tx_can_become_unconfirmed_after_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- test_sync_local_chain stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- ensure_block_emitted_after_reorg_is_at_reorg_height stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)

---- test_into_tx_graph stdout ----
getting new addresses!
got new addresses!
mining block!
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)


failures:
    ensure_block_emitted_after_reorg_is_at_reorg_height
    mempool_avoids_re_emission
    mempool_during_reorg
    mempool_re_emits_if_tx_introduction_height_not_reached
    no_agreement_point
    test_into_tx_graph
    test_sync_local_chain
    tx_can_become_unconfirmed_after_reorg

test result: FAILED. 0 passed; 8 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.35s

error: test failed, to rerun pass `-p bdk_bitcoind_rpc --test test_emitter`
(nix:nix-shell-env) matthiasdebernardini@Matthiass-MacBook-Pro-2:~/Projects/bdk_nix$

Feb 06 '24 02:02 matthiasdebernardini

Also I am having 8 tests fail, is that just me?

Check the open questions: "I still get some random failures (at least locally) because of the tests DDoS'ing the poor electrs/bitcoind local binaries. This can be fixed with -- --test-threads=2. Shall we add?"

Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.

Does cargo install big-huge-thing prints something like that? I don't think it's necessary

Feb 06 '24 08:02 storopoli

@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?

Feb 13 '24 13:02 nondiremanuel

@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?

We are close but I need some help with https://github.com/storopoli/bdk/pull/2

Feb 13 '24 14:02 storopoli

@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?

This is ready to review

Feb 15 '24 08:02 storopoli

@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?

Yes it does. Closing #1257

Edit:

This is a simpler version of #1257. Although not fully reproducible (we are fetching Cargo dependencies instead of declaring all of them).

Okay maybe not. It's just an alternative?

Feb 24 '24 09:02 evanlinjin

Hey, This is good, just set the msrv as default and lock your project, then all ci can pass. https://github.com/storopoli/bdk/pull/3

Also, there is more than one way to sign a commit, not only GPG can sign. I use the ssh sign,but the CI seems only work with GPG sign.

Feb 24 '24 09:02 yanganto

I have misunderstood that this pr will be droped, so I try to fix it without comment first.🙏 Looking forward to use nix here.😁

Feb 24 '24 12:02 yanganto

@yanganto amazing work. Thanks for the contribution. Ok this is ready to be merged.

Feb 24 '24 13:02 storopoli

@notmandatory How do you think about this PR? I love this one, and it will be good for me because I use NixOS, I always need a Nix shell when developing something. It is also good for any newcomer to the project, he does not need to worry about the environment after this PR.

Mar 12 '24 10:03 yanganto

I've squashed everything into a single commit so that we're ready to merge.

Mar 12 '24 10:03 storopoli

NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)

Not expected, I am able to run natively in MacOS. Can you post a stacktrace?

Mar 12 '24 11:03 storopoli

NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)

Not expected, I am able to run natively in MacOS. Can you post a stacktrace?

Sure, the specific error is related to CC linker 🤔

I'll send you the detailed stacktrace.

cc linker failure - stacktrace

error: linking with `cc` failed: exit status: 1
  |
[...]
  = note: ld: framework not found SystemConfiguration
          clang-16: error: linker command failed with exit code 1 (use -v to see invocation)

Mar 12 '24 12:03 oleonardolima

NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)

Not expected, I am able to run natively in MacOS. Can you post a stacktrace?

Sure, the specific error is related to CC linker 🤔

I'll send you the detailed stacktrace.
cc linker failure - stacktrace
error: linking with `cc` failed: exit status: 1
  |
[...]
  = note: ld: framework not found SystemConfiguration
          clang-16: error: linker command failed with exit code 1 (use -v to see invocation)

What package is this happened when nix try to build? I do not have a mac machine but I am happy to try fix It if there is much more context from your end.

Mar 13 '24 19:03 yanganto

[...]

What package is this happened when nix try to build? I do not have a mac machine but I am happy to try fix It if there is much more context from your end.

It happens when I try to run some command like: cargo test or cargo clippy, it's not really deterministic on which package it fails during the compilation phase.

Mar 15 '24 00:03 oleonardolima

Hi @oleonardolima Do you go into the nix shell?

You need do

nix develop (After this you will be in the shell. It will be a long run when the first time go in, and it will build up everything you need with a determined version. The second time you go into the shell, it will be quick)
other commands you want to do will in the nix shell, ex cargo fmt

Or your can do it one liner nix develop -c cargo fmt

The error from your end is about clang-16, and I see the dev shell setup clang with a determined version 14, and set up in the shell here. https://github.com/storopoli/bdk/blob/7c81fbd617ad9200f02ca3cda179405156568f04/flake.nix#L76

So if you are not in the nix shell, nix will not setup the environment for you.

Mar 15 '24 09:03 yanganto

Hi @oleonardolima Do you go into the nix shell?

You need do

nix develop (After this you will be in the shell. It will be a long run when the first time go in, and it will build up everything you need with a determined version. The second time you go into the shell, it will be quick)

other commands you want to do will in the nix shell, ex cargo fmt

Or your can do it one liner nix develop -c cargo fmt

The error from your end is about clang-16, and I see the dev shell setup clang with a determined version 14, and set up in the shell here. https://github.com/storopoli/bdk/blob/7c81fbd617ad9200f02ca3cda179405156568f04/flake.nix#L76

So if you are not in the nix shell, nix will not setup the environment for you.

Yes, I'm running it inside the nix shell.

It's weird because for the msrv shell it works, I only get this error on stable shell.

Mar 15 '24 10:03 oleonardolima

Yes, I'm running it inside the nix shell.

It's weird because for the msrv shell it works, I only get this error on the stable shell.

There are two things need to know. The clippy or fmt will be different based on the different versions of Rust.

The first thing is to use msrv, if we design to support it.

The older one will be good, and much more restrictive, so a code that passes the older one is much more passed to the new one, the trade-off is we can not use the new syntax.

When we say we support MSRV, we will not want to pop a lot of errors or warns when a user uses the MSRV version of the crate we provided and do clippy or fmt. If so, we should use msrv.

The other thing is the stable version is not a live thing in a determined build system

It only means the latest stable version when you lock it based on the time, namely based on the flake.lock. So it is still a fixed version of Rust. You can double check if the what is rust version you running locally if the version of rust is not the same as fixed with nix develop .#stable. If not, you are not in the nix shell for stable rust and just fall back to your local system. The thing we call it stable because we will update schedulely in CI in the future.

BTW, @notmandatory, in #1366, the .github/workflows/cont_integration.yml is changed the version to stable. it will run based on the time. There will be a different meaning of the stable version here, and it will not be reproducible in the future. And it will stand on the opposite side of this PR. If you want to do that, you can reject this PR, and I will not take time on the PR after you clearly reject this.

-          - version: 1.65.0 # STABLE
+         - version: stable

Mar 22 '24 09:03 yanganto

Sorry, @notmandatory I misunderstood that. #1366 is doing on the release branch not a main branch. You did not mean to reject this PR. :pray:

Mar 22 '24 09:03 yanganto

@matthiasdebernardini

Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.

We can not do this print. It will run when the user uses nix develop or other automatic daemons when the user goes into the folder. We can only print after the environment is ready. We can provide a cache on the environment in the future, so the time will based on the downloading not on the compiling.

Mar 22 '24 09:03 yanganto

Hi @ValuedMammal

If you have time, please take a look at this. The current ci failure is a 429 error, which already opened in #1120, and nothing about this PR. I just wanted to let you know that everything is done.

current PR provided a reproducible environment (every dependency and version of the software in the system are locked) for developing, CI
this will be the first step to make things robust, I believe you heard something only works on someone's machine, build server, or CI server in a lot of project. Nix is a mature solution to solve this.

By the way, nix solution also provides a nix shell, that you can work in the same environment as CI server, build machine, or other developer's laptop. This will greatly reduce the cost of team communication.

People love Rust, because it is more robust, but Rust only takes care of the application layer, because it still relies on things in the system, i.g. openssl-sys. These kinds of issues only be solved by Nix.

You possibly wonder or are aware we do not know the nix shell, but actually, we also do not know the bash shell, cat, echo, sed, ..., etc., also Linux kernel. We just use the shell and building application on top of it, nix just exposes these and lets us know, and uses a hash lock on everything we are using and produces a reproducible environment. The thing we don't know in detail is still unknown, but it is locked and becomes reproducible.

We use nix this way. nix develop -c command ... We still can easily take down nix back to a normal shell in the local environment. command ...

I believe introducing nix in the project before it goes too big will be helpful. And I hope to get your agreement on this PR.

Mar 25 '24 12:03 yanganto

@yanganto any idea why the pre-commit-nix clippy hook is complaining now?

last 25 log lines:
       > clippy...................................................................Failed
       > - hook id: clippy
       > - exit code: 101
       >
       >     Updating crates.io index
       > warning: spurious network error (3 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
       > warning: spurious network error (2 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
       > warning: spurious network error (1 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
       > error: failed to get `bip39` as a dependency of package `bdk v1.0.0-alpha.7 (/private/tmp/nix-build-pre-commit-run.drv-0/src/crates/bdk)`
       >
       > Caused by:
       >   failed to query replaced source registry `crates-io`
       >
       > Caused by:
       >   download of config.json failed
       >
       > Caused by:
       >   failed to download from `https://index.crates.io/config.json`
       >
       > Caused by:
       >   [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
       >
       > nixpkgs-fmt..............................................................Passed
       > rustfmt..................................................................Passed
       > typos....................................................................Passed
       For full logs, run 'nix log /nix/store/qk9jkq6w1g91wsrzg2dkvq32pn1j9x2b-pre-commit-run.drv'.

Offending config https://github.com/storopoli/bdk/blob/936dc781a3a8356be2b4361a385d08f56ff4b7bf/flake.nix#L139-L142:

clippy = {
  enable = true;
  entry = lib.mkForce "${rust}/bin/cargo-clippy clippy --all-targets --all-features -- -D warnings";
};

Mar 25 '24 13:03 storopoli