ci: move to Nix (simpler, easy to maintain version)
Description
This is a simpler version of #1257. Although not fully reproducible (we are fetching Cargo dependencies instead of declaring all of them). It is WAY easier to maintain and provides most of the benefits and Dev-Experience from Nix.
You can switch flawlessly from Rust stable, MSRV, and nightly using either:
nix develop .: latest Rust stablenix develop .#stable: latest Rust stablenix develop .#msrv: Rust MRSVnix develop .#nightly: Rust Nightly
It handles all the annoyances of Cargo.locks and cargo updates.
Also you don't need to worry about installing binaries from bitcoind or electrs crates,
since we fetch these from nixpkgs (with pinned versions) and add them to your local environment.
Additionally, all the extra dependencies that you need for WASM are handled as well.
Finally, no need to set ENV vars: they are also set all by default to you.
If you are committing typos, or failing to adhere to our guidelines in CONTRIBUTING.md;
we won't let you commit or push.
This is handled by Nix's automated pre-commit hooks which gives helpful error messages.
Please check the Nix.md file to learn more about Nix and what we are aiming to do with the Nix integration into BDK.
I am taking the liberty of tagging certain people that are Nix enthusiast to help the discussion and review. Cc @notmandatory, @dpc, @casey-bowman, @plebhash, @matthiasdebernardini, @gudnuf, @oleonardolima.
Notes to the reviewers
- Pre-commit checks for everything that we enforce:
- Checks for typos in the source code and documentation.
- Checks if all the commits are GPG-signed and follow the conventional commits style. (again, please check CONTRIBUTING.md)
- Runs
cargo clippyin all workspace. - Runs
cargo fmtin all workspace. - Runs
nixpgs-fmtin all workspace (for.nixfiles)
- Instructions and rationale in
Nix.md
Closes #1162. Superseds #1122, #1156, #1165, #1257.
Pinneds Dependencies:
bitcoind: pinned to 0.27.0 innixpkgsversion24.05.electrs: pinned toBlockstream's esplora.
Changelog notice
- We moved to Nix-based CI and Local development environment to better onboard new developers and enhance the productivity of developers.
Checklists
All Submissions:
- [x] I've signed all my commits
- [x] I followed the contribution guidelines
- [x] I ran
cargo fmtandcargo clippybefore committing
New Features:
- [x] I've added tests for the new feature
- [x] I've added docs for the new feature
Bugfixes:
- [ ] This pull request breaks the existing API
- [x] I've added tests to reproduce the issue which are now passing
- [x] I'm linking the issue being fixed by this PR
This can be fixed with -- --test-threads=2. Shall we add?
Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?
So just the dev shells, no crane etc.? Seems reasonable. Dev shells are like 80% of benefits for very little Nix involvment.
This can be fixed with -- --test-threads=2. Shall we add?
Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?
Good call I did this in a1997cf324aa5d5c70ce0eaa46a8c27f16d7c4a2
bitcoind_conf.args.push("-rpcworkqueue=1024");
bitcoind_conf.args.push("-rpcthreads=64");
This can be fixed with -- --test-threads=2. Shall we add?
Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?
Good call I did this in a1997cf
bitcoind_conf.args.push("-rpcworkqueue=1024"); bitcoind_conf.args.push("-rpcthreads=64");
64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).
This can be fixed with -- --test-threads=2. Shall we add?
Why not increase bitcoind's rpc threads and work queue from the potato-like defaults instead?
Good call I did this in a1997cf
bitcoind_conf.args.push("-rpcworkqueue=1024"); bitcoind_conf.args.push("-rpcthreads=64");64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).
4 is the default. So no change then? I can easily delete a1997cf324aa5d5c70ce0eaa46a8c27f16d7c4a2 (that's why I did it in a separate commit 😄)
64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).
Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.
64 threads is way too much. I think the ideal is to have 1 thread per CPU core. I think a good compromise is 4 (between desktop and CI).
Its a io bound workload. Absent async capabilities default 4 is laughable even for one core and I don't even know why bitcoind is doing it.
I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?
I was on my phone before, so let me elaborate.
I didn't realize it was io-bounded. However, in our test environment, I doubt there will be 64 concurrent calls. How about we meet in the middle and say 16?
From a perspective of most workloads e.g. asking for blockchain data, bitcoind (and electrs) acts very much like database software, e.g. posgresql. Postgresql is an interesting case because it utilizes process-based model - meaning it will spawn not just a thread, but a whole process for every client connection - waaay heavier than just a thread.
Even for that heavy model, the recommendations will be to limit it around around 500.
It takes many threads to saturate modern IO device. Just an example from a yt video I had on hand benchamarking filesystems. Each bar is number of workers. And these are dedicated IO-generating workers, generating heavy load in a tight loop. So in this benchmark, it will take like 4 workers. Possibly one worker is even more threads. But an rpc thread in bitcoind by necessity will be spending lots of time reading things and writing stuff to a tcp connection, doing some checks, re-encodings and what not. To actually get a maximum IO performance there will need to be a lot of them. I'd say 64 per core.
On my mostly idle laptop at this very moment there are 2k threads running:
On a modern Linux a thread is just a stack, which due to CoW optimizations will really amount to 1 or 2 pages (4K page size) of physically used memory plus a tiny (1K?) amount to track it all in the kernel.
There's really not much to save on thread counts in a grand scheme of things.
Personally, I like to run as lots and lots of tests in parallel, especially any sort of integration and e2e tests, which tend to have lots of dead time - waiting for stuff (like disk and network IO - latency-bound things). The usual limit to this is the available memory. Additional benefit is also that it tends to expose problems more often, as all operations tend to have more ... timing jitter.
So I'm really surprised that the default in bitcoind is 4. That's too little even for an old generation raspberry pi.
Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.
Also I am having 8 tests fail, is that just me?
running 8 tests
test mempool_re_emits_if_tx_introduction_height_not_reached ... FAILED
test mempool_avoids_re_emission ... FAILED
test no_agreement_point ... FAILED
test mempool_during_reorg ... FAILED
test tx_can_become_unconfirmed_after_reorg ... FAILED
test test_sync_local_chain ... FAILED
test ensure_block_emitted_after_reorg_is_at_reorg_height ... FAILED
test test_into_tx_graph ... FAILED
failures:
---- mempool_re_emits_if_tx_introduction_height_not_reached stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- mempool_avoids_re_emission stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- no_agreement_point stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- mempool_during_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- tx_can_become_unconfirmed_after_reorg stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- test_sync_local_chain stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- ensure_block_emitted_after_reorg_is_at_reorg_height stdout ----
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
---- test_into_tx_graph stdout ----
getting new addresses!
got new addresses!
mining block!
Error: JSON-RPC error: transport error: Couldn't connect to host: Resource temporarily unavailable (os error 35)
failures:
ensure_block_emitted_after_reorg_is_at_reorg_height
mempool_avoids_re_emission
mempool_during_reorg
mempool_re_emits_if_tx_introduction_height_not_reached
no_agreement_point
test_into_tx_graph
test_sync_local_chain
tx_can_become_unconfirmed_after_reorg
test result: FAILED. 0 passed; 8 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.35s
error: test failed, to rerun pass `-p bdk_bitcoind_rpc --test test_emitter`
(nix:nix-shell-env) matthiasdebernardini@Matthiass-MacBook-Pro-2:~/Projects/bdk_nix$
Also I am having 8 tests fail, is that just me?
Check the open questions: "I still get some random failures (at least locally) because of the tests DDoS'ing the poor electrs/bitcoind local binaries. This can be fixed with -- --test-threads=2. Shall we add?"
Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.
Does cargo install big-huge-thing prints something like that? I don't think it's necessary
@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?
@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?
We are close but I need some help with https://github.com/storopoli/bdk/pull/2
@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?
This is ready to review
@storopoli does this PR substitute #1257? Shall we close that or do we need to leave it open?
Yes it does. Closing #1257
Edit:
This is a simpler version of #1257. Although not fully reproducible (we are fetching Cargo dependencies instead of declaring all of them).
Okay maybe not. It's just an alternative?
Hey, This is good, just set the msrv as default and lock your project, then all ci can pass. https://github.com/storopoli/bdk/pull/3
Also, there is more than one way to sign a commit, not only GPG can sign. I use the ssh sign,but the CI seems only work with GPG sign.
I have misunderstood that this pr will be droped, so I try to fix it without comment first.🙏 Looking forward to use nix here.😁
@yanganto amazing work. Thanks for the contribution. Ok this is ready to be merged.
@notmandatory How do you think about this PR? I love this one, and it will be good for me because I use NixOS, I always need a Nix shell when developing something. It is also good for any newcomer to the project, he does not need to worry about the environment after this PR.
I've squashed everything into a single commit so that we're ready to merge.
NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)
Not expected, I am able to run natively in MacOS. Can you post a stacktrace?
NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)
Not expected, I am able to run natively in MacOS. Can you post a stacktrace?
Sure, the specific error is related to CC linker 🤔
I'll send you the detailed stacktrace.
cc linker failure - stacktrace
error: linking with `cc` failed: exit status: 1
|
[...]
= note: ld: framework not found SystemConfiguration
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
NOTE: I wasn't able to run it on the Nix installed natively on macOS (as expected I guess)
Not expected, I am able to run natively in MacOS. Can you post a stacktrace?
Sure, the specific error is related to CC linker 🤔
I'll send you the detailed stacktrace.
cc linker failure - stacktrace
error: linking with `cc` failed: exit status: 1 | [...] = note: ld: framework not found SystemConfiguration clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
What package is this happened when nix try to build? I do not have a mac machine but I am happy to try fix It if there is much more context from your end.
[...]
What package is this happened when nix try to build? I do not have a mac machine but I am happy to try fix It if there is much more context from your end.
It happens when I try to run some command like: cargo test or cargo clippy, it's not really deterministic on which package it fails during the compilation phase.
Hi @oleonardolima Do you go into the nix shell?
You need do
nix develop(After this you will be in the shell. It will be a long run when the first time go in, and it will build up everything you need with a determined version. The second time you go into the shell, it will be quick)- other commands you want to do will in the nix shell, ex
cargo fmt
Or your can do it one liner
nix develop -c cargo fmt
The error from your end is about clang-16, and I see the dev shell setup clang with a determined version 14, and set up in the shell here.
https://github.com/storopoli/bdk/blob/7c81fbd617ad9200f02ca3cda179405156568f04/flake.nix#L76
So if you are not in the nix shell, nix will not setup the environment for you.
Hi @oleonardolima Do you go into the nix shell?
You need do
nix develop(After this you will be in the shell. It will be a long run when the first time go in, and it will build up everything you need with a determined version. The second time you go into the shell, it will be quick)- other commands you want to do will in the nix shell, ex
cargo fmtOr your can do it one liner
nix develop -c cargo fmtThe error from your end is about
clang-16, and I see the dev shell setup clang with a determined version 14, and set up in the shell here. https://github.com/storopoli/bdk/blob/7c81fbd617ad9200f02ca3cda179405156568f04/flake.nix#L76So if you are not in the nix shell, nix will not setup the environment for you.
Yes, I'm running it inside the nix shell.
It's weird because for the msrv shell it works, I only get this error on stable shell.
Yes, I'm running it inside the nix shell.
It's weird because for the msrv shell it works, I only get this error on the stable shell.
There are two things need to know. The clippy or fmt will be different based on the different versions of Rust.
The first thing is to use msrv, if we design to support it.
The older one will be good, and much more restrictive, so a code that passes the older one is much more passed to the new one, the trade-off is we can not use the new syntax.
When we say we support MSRV, we will not want to pop a lot of errors or warns when a user uses the MSRV version of the crate we provided and do clippy or fmt. If so, we should use msrv.
The other thing is the stable version is not a live thing in a determined build system
It only means the latest stable version when you lock it based on the time, namely based on the flake.lock. So it is still a fixed version of Rust. You can double check if the what is rust version you running locally if the version of rust is not the same as fixed with nix develop .#stable. If not, you are not in the nix shell for stable rust and just fall back to your local system.
The thing we call it stable because we will update schedulely in CI in the future.
BTW, @notmandatory, in #1366, the .github/workflows/cont_integration.yml is changed the version to stable. it will run based on the time. There will be a different meaning of the stable version here, and it will not be reproducible in the future. And it will stand on the opposite side of this PR. If you want to do that, you can reject this PR, and I will not take time on the PR after you clearly reject this.
- - version: 1.65.0 # STABLE
+ - version: stable
Sorry, @notmandatory I misunderstood that. #1366 is doing on the release branch not a main branch. You did not mean to reject this PR. :pray:
@matthiasdebernardini
Can you have it print at the beginning, "First time takes a good long while..." or something like that. Some parts seem so slow that you'd think the shell crashed or something like that. Would be good for the user to know this.
We can not do this print. It will run when the user uses nix develop or other automatic daemons when the user goes into the folder. We can only print after the environment is ready. We can provide a cache on the environment in the future, so the time will based on the downloading not on the compiling.
Hi @ValuedMammal
If you have time, please take a look at this. The current ci failure is a 429 error, which already opened in #1120, and nothing about this PR. I just wanted to let you know that everything is done.
- current PR provided a reproducible environment (every dependency and version of the software in the system are locked) for developing, CI
- this will be the first step to make things robust, I believe you heard something only works on someone's machine, build server, or CI server in a lot of project. Nix is a mature solution to solve this.
By the way, nix solution also provides a nix shell, that you can work in the same environment as CI server, build machine, or other developer's laptop. This will greatly reduce the cost of team communication.
People love Rust, because it is more robust, but Rust only takes care of the application layer, because it still relies on things in the system, i.g. openssl-sys. These kinds of issues only be solved by Nix.
You possibly wonder or are aware we do not know the nix shell, but actually, we also do not know the bash shell, cat, echo, sed, ..., etc., also Linux kernel. We just use the shell and building application on top of it, nix just exposes these and lets us know, and uses a hash lock on everything we are using and produces a reproducible environment. The thing we don't know in detail is still unknown, but it is locked and becomes reproducible.
We use nix this way.
nix develop -c command ...
We still can easily take down nix back to a normal shell in the local environment.
command ...
I believe introducing nix in the project before it goes too big will be helpful. And I hope to get your agreement on this PR.
@yanganto any idea why the pre-commit-nix clippy hook is complaining now?
last 25 log lines:
> clippy...................................................................Failed
> - hook id: clippy
> - exit code: 101
>
> Updating crates.io index
> warning: spurious network error (3 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
> warning: spurious network error (2 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
> warning: spurious network error (1 tries remaining): [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
> error: failed to get `bip39` as a dependency of package `bdk v1.0.0-alpha.7 (/private/tmp/nix-build-pre-commit-run.drv-0/src/crates/bdk)`
>
> Caused by:
> failed to query replaced source registry `crates-io`
>
> Caused by:
> download of config.json failed
>
> Caused by:
> failed to download from `https://index.crates.io/config.json`
>
> Caused by:
> [35] SSL connect error (OpenSSL/3.0.13: error:16000069:STORE routines::unregistered scheme)
>
> nixpkgs-fmt..............................................................Passed
> rustfmt..................................................................Passed
> typos....................................................................Passed
For full logs, run 'nix log /nix/store/qk9jkq6w1g91wsrzg2dkvq32pn1j9x2b-pre-commit-run.drv'.
Offending config https://github.com/storopoli/bdk/blob/936dc781a3a8356be2b4361a385d08f56ff4b7bf/flake.nix#L139-L142:
clippy = {
enable = true;
entry = lib.mkForce "${rust}/bin/cargo-clippy clippy --all-targets --all-features -- -D warnings";
};