kernel: Streamline util library
Remove fees.h, errors.h, and spanparsing.h from the util library. Specifically:
- Move
Splitfunctions fromutil/spanparsing.htoutil/string.h, usingutilnamespace for clarity. - Move remaining spanparsing functions to
script/parsing.hsince they are used for descriptor and miniscript parsing. - Combine
util/fees.handutil/errors.hintocommon/messages.hso there is a place for simple functions that generate user messages to live, and these functions are not part of the util library.
Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications. These changes should also improve code organization and make functions easier to discover. Some of these same moves are (or were) part of #28690, but did not help with code organization, or made it worse, so it is better to move them and clean them up in the same PR so code only has to change one time.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage
For detailed information about the code coverage, see the test coverage report.
Reviews
See the guideline for information on the review process.
| Type | Reviewers |
|---|---|
| ACK | TheCharlatan, hebasto, achow101 |
| Concept ACK | Sjors |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #bitcoin-core/gui/599 (Translate unit names & fix external signer error by luke-jr)
- #30212 (rename TransactionErrors: MISSING_INPUTS and ALREADY_IN_CHAIN by willcl-ark)
- #30200 (Introduce Mining interface by Sjors)
- #30043 (net: Replace libnatpmp with built-in PCP+NATPMP implementation by laanwj)
- #29473 (refactor: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing by paplorinc)
- #29432 (Stratum v2 Template Provider (take 3) by Sjors)
- #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
- #29365 (Extend signetchallenge to set target block spacing by starius)
- #29346 (Stratum v2 Noise Protocol by Sjors)
- #29278 (Wallet: Add
maxfeeratewallet startup option by ismaelsadeeq) - #28710 (Remove the legacy wallet and BDB dependency by achow101)
- #22838 (descriptors: Be able to specify change and receiving in a single descriptor string by achow101)
- #21283 (Implement BIP 370 PSBTv2 by achow101)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.
Is the idea here that util should be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?
Updated 0e647454b49274ee82978c06b606f3fd9c02b779 -> 778c0214d25e5a012c61251d592257ae19805255 (pr/rmutil.1 -> pr/rmutil.2, compare) fixing CI errors from forgitting to add new files to git
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1844180519
Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.
Is the idea here that
utilshould be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?
I do think that's generally a good thing to do, but it's not really the goal of this PR. The main thing this PR is trying to do is provide a better kernel API and reduce the size of the kernel library.
Because libbitcoin_kernel depends on libbitcoin_util, any external application using the kernel will also depend on util. So we should try to avoid including functions and types in util that are not likely to be useful to external applications, or are unstable and meant for internal use. The common library is usually a better place for these things.
Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.
Is the idea here that
utilshould be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?I do think that's generally a good thing to do, but it's not really the goal of this PR. The main thing this PR is trying to do is provide a better kernel API and reduce the size of the kernel library.
Does that mean that things should only go in util if they're going to be used by the kernel library then? That would presumably mean util/bitdeque.h and util/sock.cpp should also eventually go elsewhere, for example? That seems like a bad approach to me (in that it means code has to move into and out of util based on whether it's relevant to the kernel, rather than due to any changes to the code itself, and users then potentially have to deal with it moving into/outof the util:: namespace).
Does that mean that things should only go in util if they're going to be used by the kernel library then?
No, I don't think so, and I don't think much else is going to move after this PR. I think only code that shouldn't be used by the kernel or kernel applications should be moved out of util, not just code that isn't used by the kernel. I agree just moving any code that isn't used by the kernel would be a bad approach, and wrote basically the same comment as you recently in https://github.com/bitcoin/bitcoin/pull/27989#issuecomment-1613966139, so I think we are in agreement.
Updated 778c0214d25e5a012c61251d592257ae19805255 -> 8b21f41335dc837b36ea863156605d092f47a0ab (pr/rmutil.2 -> pr/rmutil.3, compare) to fix clang-tidy error https://cirrus-ci.com/task/4602567305461760. Also switched to using declarations more places to avoid creating conflicts and make the PR easier to review
Updated 8b21f41335dc837b36ea863156605d092f47a0ab -> 8748d63a07d23237dca41f8a5f91288ce59b1de8 (pr/rmutil.3 -> pr/rmutil.4, compare) to fix another clang-tidy error https://cirrus-ci.com/task/4950555420786688 and making other small cleanups
Updated 8748d63a07d23237dca41f8a5f91288ce59b1de8 -> b23409474f6fe874a25374bb978db9e5682954e6 (pr/rmutil.4 -> pr/rmutil.5, compare) to try to fix another clang-tidy error
@ryanofsky
https://github.com/bitcoin/bitcoin/pull/29015#issue-2029493410:
the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.
https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1846332843:
only code that shouldn't be used by the kernel or kernel applications should be moved out of util, not just code that isn't used by the kernel.
https://github.com/bitcoin/bitcoin/pull/27989#issuecomment-1613966139:
I think util/ is a good home for general purpose code providing basic data structures and cross platform capabilities, and I think common/ is good home for project-specific code that needs to be shared between bitcoind, bitcoin-cli, and bitcoin-wallet.
Does it make sense to improve https://github.com/bitcoin/bitcoin/blob/master/doc/design/libraries.md by expressing the idea from the quotes above more clearly? As for now, it reads:
In general, if it is ever unclear whether it is better to add code to util or common, it is probably better to add it to common unless it is very generically useful or useful particularly to include in the kernel.
Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1848935768
Does it make sense to improve https://github.com/bitcoin/bitcoin/blob/master/doc/design/libraries.md by expressing the idea from the quotes above more clearly? As for now, it reads:
This seems like the right place to talk about the relationship between the util and kernel library. My work on the kernel library has been focusing on reducing its essence to validation code, pruning external dependencies, and stripping out as much system related functionality as possible. I think if the goal is to have the kernel be capable of running on as many platforms as possible, potentially beyond the number of platforms that Bitcoin Core is capable of today and extending into embedded environments, this last point becomes crucial.
Looking at the content of util it provides roughly two types of functionality: Utilities for basic data structures and their serialization, and cross-platform system related utilities, like file handling, threading, process handling, etc.
In my opinion, making the util library a proper dependency of the kernel should not introduce new cross-platform system-related content into the kernel library. Besides potentially limiting use cases of the kernel, providing cross-platform utilities beyond what the kernel strictly requires seems like a potentially unreasonable future maintenance burden. This would be contrary to our incremental process of creating a minimal kernel library.
The asterisk here is that some of the utilities that are not currently used by the kernel do not introduce new dependencies, since they entirely rely on code that is already used by the kernel. I also generally think that some utilities could remain in the util library even though they are not currently used by validation code. The criteria I propose for evaluating if modules should remain in util after this PR are:
- The module is directly used by kernel code.
- The module contains basic data structure utilities for interacting with kernel code.
- The module does not contain system-related utilities that are not directly used by the kernel.
- The module does not contain project-specific code that is not relevant to the kernel.
- The module does not introduce new dependencies beyond the current dependencies of the kernel.
Looking at @hebasto's comment here, threadinterrupt.cpp and readwritefile.cpp would violate the 3rd criteria in my eyes, while the spanparsing and bytevectorhash modules could potentially remain in util.
Between this PR and #28690 I think it would be nice if common criteria for util would be defined and documented. All this said, I think it would be acceptable too if things were kept purposefully vague as they are now until the need for more concrete structures arises.
Would it make sense to just merge common into util; but have a "stripped" version of util available for the kernel, that excludes stuff that doesn't match the 5 points above? That way the difference is just at the build system level, it doesn't involve moving files around or moving code between namespaces...
Then the advice would be: (a) just put things in util, (b) but only add things to the "util-core" section of the build file when they're needed by the kernel; (c) keep things in the util:: namespace; (d) split stuff into different .h/.cpp files pretty aggressively?
That would imply a different approach for the util: move fees.h and error.h to common/messages.h commit here.
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849208455
Would it make sense to just merge common into util; but have a "stripped" version of util available for the kernel
Idea to build util and util-core (or util-lite?) libraries is interesting, because then it would be easy to add/remove things from util library depending on whether the kernel needs them, and it would never require renaming anything. I could see this being useful in the future if util library started growing a lot and accumulating more features of over time.
But for now, IMO, just sticking with the current util/common separation and using the cleanup in this PR would organize things better and be less confusing than having two util versions, and it would require moving fewer things. Even though the libraries.md document is currently ambiguous, I think it should be pretty easy to remove the ambiguity here.
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849026409
The criteria I propose for evaluating if modules should remain in util after this PR are:
I think these are good rules if you replace "basic data structure utilities" with "basic utilities that to do not require outside dependencies", and also drop the 3rd rule. It seems like with the 3rd rule you are trying to make a distinction between "data structure utilities" and "system-related utilities", and I don't think it helps to draw a dividing line there.
Added 1 commits b23409474f6fe874a25374bb978db9e5682954e6 -> 6d8cb4d4cd6062ad06cd351439e9139e028995d3 (pr/rmutil.5 -> pr/rmutil.6, compare) updating libraries.md to describe differences between util and common libraries better.
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849026409
My work on the kernel library has been focusing on reducing its essence to validation code, pruning external dependencies, and stripping out as much system related functionality as possible. [...] In my opinion, making the
utillibrary a proper dependency of the kernel should not introduce new cross-platform system-related content into the kernel library.
Just to address these points directly, I think the goals of making the kernel portable and working on any platform, and making the kernel minimal working and in constrained environments are great goals. I just don't you need to rm src/util/sock.cpp to achieve these goals. As long as the util library is distributed as a static library, or built as a static library and linked into a dynamic library, the socket functions will not be part of resulting executables if they are not called. And if the kernel is being built for a platform which doesn't support sockets at all, src/util/sock.cpp can just be stubbed or not compiled at all and there should be no problem for portability either. I do think it's good to keep the util library small and lightweight and remove things which don't fit there. But I don't think it needs to be stripped down in a more extreme way.
Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1850626464
And if the kernel is being built for a platform which doesn't support sockets at all, src/util/sock.cpp can just be stubbed or not compiled at all and there should be no problem for portability either.
One of the guiding principles for the kernel project so far has been to prohibit future re-couplings of decoupled modules. The tracking issue says "Any further coupling of our consensus engine with non-consensus modules will result in linker errors, preventing this project from becoming a Sisyphean task of battling coupling regressions.". How could this fit in with leaving unused content in the library?
One of the guiding principles for the kernel project so far has been to prohibit future re-couplings of decoupled modules.
I reread the issue you linked to, but I don't understand what it implies here. I do think you can rely on linker errors to ensure kernel code isn't calling wallet code or GUI code or P2P code or storing state in global variables. But what else do you want beyond that? And how does removing sock.cpp from the util library help, for example?
Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1850970368
But what else do you want beyond that? And how does removing sock.cpp from the util library help, for example?
Maybe removing sock.cpp is a bad example, since it is already impossible to link against it. sock.cpp uses modules that are already not part of the kernel, so trying to link it in will result in errors. Then again, if sock.cpp uses common modules, why is it in util in the first place? To provide another concrete example, there is thread.cpp, which I was hoping to remove from the kernel in #28960. If it is removed, how could the kernel library be guarded from its future re-introduction if it remains in the util library?
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1851506064
Maybe removing
sock.cppis a bad example, since it is already impossible to link against it.sock.cppuses modules that are already not part of the kernel, so trying to link it in will result in errors. Then again, ifsock.cppuses common modules, why is it inutilin the first place?
I'm actually not sure what is this referring to. Running nm ./src/util/libbitcoin_util_a-sock.o --undefined-only I see sock.cpp is using LogInstance, NodeClock, and SysErrorString which should all be part of the util library.
To provide another concrete example, there is
thread.cpp, which I was hoping to remove from the kernel in #28960. If it is removed, how could the kernel library be guarded from its future re-introduction if it remains in theutillibrary?
thread.h contains TraceThread which seems like a nice utility for the naming threads (which is helpful for monitoring in top) and printing exceptions (which is useful for debugging). I guess kernel code isn't currently using this function (or #28960 removes the last usage), but as long as kernel code is calling threads, what benefit is to preventing kernel code from calling this function? It seems like a useful function, and if anything more code should use it, I would think.
I do agree with goals of making the kernel small and portable, and agree we should organize libraries with that in mind. I also think we should try to trigger link errors to enforce API boundaries. It just seems to me we can do these things without moving a lot around.
Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1852688371
I'm actually not sure what is this referring to. Running nm ./src/util/libbitcoin_util_a-sock.o --undefined-only I see sock.cpp is using LogInstance, NodeClock, and SysErrorString which should all be part of the util library.
Thanks for actually verifying, I just assumed that the common/system.hinclude was actually used :(.
Concept ACK. Moving things out of util that the kernel will / should never need, makes sense to me. I don't have strong feelings on where exactly to move these things to.
Concept ACK
Rebased 486379c10fdce568a64b78cbb2dad1fa52263a79 -> ffcd3068359ea35f3b7c491df866a0d6924e3b37 (pr/rmutil.10 -> pr/rmutil.11, compare) due to conflict with #28960
I've tested 5eeafeb302dcf56dd5c16c1e4785a41a132344c7 on Ubuntu 23.10.
While all changes look correct and reasonable to me, I was expecting that this PR removes some of dependencies mentioned in https://github.com/bitcoin/bitcoin/issues/28548.
But that is not the case. The libbitcoin_util still depends on libbitcoin_crypto, libbitcoin_consensus and libbitcoin_common libraries.
If such an outcome is expected, maybe update the doc/design/libraries.md accordingly?
Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-2083201174
But that is not the case. The libbitcoin_util still depends on libbitcoin_crypto, libbitcoin_consensus and libbitcoin_common libraries. If such an outcome is expected, maybe update the doc/design/libraries.md accordingly?
I understand this PR more as providing clarity of what should be in the util library and #28690 as actually realizing the diagram.
re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-2083201174
The
libbitcoin_utilstill depends onlibbitcoin_crypto,libbitcoin_consensusandlibbitcoin_commonlibraries.
Thanks for checking this. Letting the util library depend on the crypto library seems like the easiest path to take because random.cpp, randomenv.cpp, hasher.cpp, and bytevectorhash.cpp are using the crypto hash functions. But I added new commits to the PR to remove the util dependency on crypt, and remove consensus and common dependencies on util. I also added a script which turned up a few other other unexpected dependencies (not related to util), which could be addressed separately. I think the script should be able to run in CI, too, but I did not set that up.
Rebased 5eeafeb302dcf56dd5c16c1e4785a41a132344c7 -> 6ea5ee5a268cfbb9cd234e0f1381c8a4785ff04c (pr/rmutil.13 -> pr/rmutil.14, compare) adding dependency checking script and removing unexpected dependencies related to util
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Debug: https://github.com/bitcoin/bitcoin/runs/24838365077
I also added a script which turned up a few other other unexpected dependencies (not related to util), which could be addressed separately.
It would be great to have such a tool at our disposal.
There is an error in commit fd342cb40b63ca00d23946743a038847673f8726, which can be fixed with:
diff --git a/src/warnings.cpp b/src/warnings.cpp
index 5b6ace63dd..40043cd8e7 100644
--- a/src/warnings.cpp
+++ b/src/warnings.cpp
@@ -11,0 +12 @@
+#include <util/string.h>
The check script is very useful, could you add it as the first commit? I think it could make review a bit easier.
There is another CI failure in https://cirrus-ci.com/task/6531594385620992:
crypto/.libs/libbitcoin_crypto_base.a(crypto_libbitcoin_crypto_base_la-cleanse.o): in function `memory_cleanse(void*, unsigned long)':
cleanse.cpp:(.text+0x0): multiple definition of `memory_cleanse(void*, unsigned long)'; /usr/bin/ld: DWARF error: invalid or unhandled FORM value: 0x22
support/.libs/libbitcoinkernel_la-cleanse.o:cleanse.cpp:(.text+0x0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Debug: https://github.com/bitcoin/bitcoin/runs/25060059418
There seems to be a conflict with bitcoin/bitcoin#30098. Maybe rebase?
There seems to be a conflict with #30098. Maybe rebase?
Thanks! Rebased now.
If other reviewers or maintainers are not happy with introducing a bash script, I could volunteer some time to translate it.
Yes this could be a good idea. The script started out just being a few lines of bash listing symbols exported by one library and imported by another one. But then it grew as it was extended to support multiple libraries and suppress known errors. So now it's no longer a small script, though it is pretty clean and well commented. Other possible followups besides translating the script could be:
- Running the script as part of CI
- Cleaning up unexpected dependencies listed as suppressions in the script.
Rebased 7689dfd6646054a0be8e62ccbf72d5403e28b548 -> 1c26d10b23bb7a7d0204b4a8b44a50fad89f2a2a (pr/rmutil.18 -> pr/rmutil.19, compare) due to silent conflict with #30098