bitcoin icon indicating copy to clipboard operation
bitcoin copied to clipboard

kernel: Streamline util library

Open ryanofsky opened this issue 2 years ago • 32 comments

Remove fees.h, errors.h, and spanparsing.h from the util library. Specifically:

  • Move Split functions from util/spanparsing.h to util/string.h, using util namespace for clarity.
  • Move remaining spanparsing functions to script/parsing.h since they are used for descriptor and miniscript parsing.
  • Combine util/fees.h and util/errors.h into common/messages.h so there is a place for simple functions that generate user messages to live, and these functions are not part of the util library.

Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications. These changes should also improve code organization and make functions easier to discover. Some of these same moves are (or were) part of #28690, but did not help with code organization, or made it worse, so it is better to move them and clean them up in the same PR so code only has to change one time.

ryanofsky avatar Dec 06 '23 22:12 ryanofsky

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, hebasto, achow101
Concept ACK Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #bitcoin-core/gui/599 (Translate unit names & fix external signer error by luke-jr)
  • #30212 (rename TransactionErrors: MISSING_INPUTS and ALREADY_IN_CHAIN by willcl-ark)
  • #30200 (Introduce Mining interface by Sjors)
  • #30043 (net: Replace libnatpmp with built-in PCP+NATPMP implementation by laanwj)
  • #29473 (refactor: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing by paplorinc)
  • #29432 (Stratum v2 Template Provider (take 3) by Sjors)
  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
  • #29365 (Extend signetchallenge to set target block spacing by starius)
  • #29346 (Stratum v2 Noise Protocol by Sjors)
  • #29278 (Wallet: Add maxfeerate wallet startup option by ismaelsadeeq)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)
  • #22838 (descriptors: Be able to specify change and receiving in a single descriptor string by achow101)
  • #21283 (Implement BIP 370 PSBTv2 by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot avatar Dec 06 '23 22:12 DrahtBot

Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.

Is the idea here that util should be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?

ajtowns avatar Dec 07 '23 03:12 ajtowns

Updated 0e647454b49274ee82978c06b606f3fd9c02b779 -> 778c0214d25e5a012c61251d592257ae19805255 (pr/rmutil.1 -> pr/rmutil.2, compare) fixing CI errors from forgitting to add new files to git

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1844180519

Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.

Is the idea here that util should be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?

I do think that's generally a good thing to do, but it's not really the goal of this PR. The main thing this PR is trying to do is provide a better kernel API and reduce the size of the kernel library.

Because libbitcoin_kernel depends on libbitcoin_util, any external application using the kernel will also depend on util. So we should try to avoid including functions and types in util that are not likely to be useful to external applications, or are unstable and meant for internal use. The common library is usually a better place for these things.

ryanofsky avatar Dec 07 '23 23:12 ryanofsky

Motivation for this change is that the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.

Is the idea here that util should be for generic code, and that these bits and pieces (like fees and error) are specific bitcoin features that are only relevant to wallet/p2p, so aren't generic?

I do think that's generally a good thing to do, but it's not really the goal of this PR. The main thing this PR is trying to do is provide a better kernel API and reduce the size of the kernel library.

Does that mean that things should only go in util if they're going to be used by the kernel library then? That would presumably mean util/bitdeque.h and util/sock.cpp should also eventually go elsewhere, for example? That seems like a bad approach to me (in that it means code has to move into and out of util based on whether it's relevant to the kernel, rather than due to any changes to the code itself, and users then potentially have to deal with it moving into/outof the util:: namespace).

ajtowns avatar Dec 07 '23 23:12 ajtowns

Does that mean that things should only go in util if they're going to be used by the kernel library then?

No, I don't think so, and I don't think much else is going to move after this PR. I think only code that shouldn't be used by the kernel or kernel applications should be moved out of util, not just code that isn't used by the kernel. I agree just moving any code that isn't used by the kernel would be a bad approach, and wrote basically the same comment as you recently in https://github.com/bitcoin/bitcoin/pull/27989#issuecomment-1613966139, so I think we are in agreement.

ryanofsky avatar Dec 08 '23 00:12 ryanofsky

Updated 778c0214d25e5a012c61251d592257ae19805255 -> 8b21f41335dc837b36ea863156605d092f47a0ab (pr/rmutil.2 -> pr/rmutil.3, compare) to fix clang-tidy error https://cirrus-ci.com/task/4602567305461760. Also switched to using declarations more places to avoid creating conflicts and make the PR easier to review Updated 8b21f41335dc837b36ea863156605d092f47a0ab -> 8748d63a07d23237dca41f8a5f91288ce59b1de8 (pr/rmutil.3 -> pr/rmutil.4, compare) to fix another clang-tidy error https://cirrus-ci.com/task/4950555420786688 and making other small cleanups Updated 8748d63a07d23237dca41f8a5f91288ce59b1de8 -> b23409474f6fe874a25374bb978db9e5682954e6 (pr/rmutil.4 -> pr/rmutil.5, compare) to try to fix another clang-tidy error

ryanofsky avatar Dec 08 '23 18:12 ryanofsky

@ryanofsky

https://github.com/bitcoin/bitcoin/pull/29015#issue-2029493410:

the util library is a dependency of the kernel, and we should remove functionality from util that shouldn't be called by kernel code or kernel applications.

https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1846332843:

only code that shouldn't be used by the kernel or kernel applications should be moved out of util, not just code that isn't used by the kernel.

https://github.com/bitcoin/bitcoin/pull/27989#issuecomment-1613966139:

I think util/ is a good home for general purpose code providing basic data structures and cross platform capabilities, and I think common/ is good home for project-specific code that needs to be shared between bitcoind, bitcoin-cli, and bitcoin-wallet.

Does it make sense to improve https://github.com/bitcoin/bitcoin/blob/master/doc/design/libraries.md by expressing the idea from the quotes above more clearly? As for now, it reads:

In general, if it is ever unclear whether it is better to add code to util or common, it is probably better to add it to common unless it is very generically useful or useful particularly to include in the kernel.

hebasto avatar Dec 10 '23 11:12 hebasto

Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1848935768

Does it make sense to improve https://github.com/bitcoin/bitcoin/blob/master/doc/design/libraries.md by expressing the idea from the quotes above more clearly? As for now, it reads:

This seems like the right place to talk about the relationship between the util and kernel library. My work on the kernel library has been focusing on reducing its essence to validation code, pruning external dependencies, and stripping out as much system related functionality as possible. I think if the goal is to have the kernel be capable of running on as many platforms as possible, potentially beyond the number of platforms that Bitcoin Core is capable of today and extending into embedded environments, this last point becomes crucial.

Looking at the content of util it provides roughly two types of functionality: Utilities for basic data structures and their serialization, and cross-platform system related utilities, like file handling, threading, process handling, etc.

In my opinion, making the util library a proper dependency of the kernel should not introduce new cross-platform system-related content into the kernel library. Besides potentially limiting use cases of the kernel, providing cross-platform utilities beyond what the kernel strictly requires seems like a potentially unreasonable future maintenance burden. This would be contrary to our incremental process of creating a minimal kernel library.

The asterisk here is that some of the utilities that are not currently used by the kernel do not introduce new dependencies, since they entirely rely on code that is already used by the kernel. I also generally think that some utilities could remain in the util library even though they are not currently used by validation code. The criteria I propose for evaluating if modules should remain in util after this PR are:

  1. The module is directly used by kernel code.
  2. The module contains basic data structure utilities for interacting with kernel code.
  3. The module does not contain system-related utilities that are not directly used by the kernel.
  4. The module does not contain project-specific code that is not relevant to the kernel.
  5. The module does not introduce new dependencies beyond the current dependencies of the kernel.

Looking at @hebasto's comment here, threadinterrupt.cpp and readwritefile.cpp would violate the 3rd criteria in my eyes, while the spanparsing and bytevectorhash modules could potentially remain in util.

Between this PR and #28690 I think it would be nice if common criteria for util would be defined and documented. All this said, I think it would be acceptable too if things were kept purposefully vague as they are now until the need for more concrete structures arises.

sedited avatar Dec 10 '23 17:12 sedited

Would it make sense to just merge common into util; but have a "stripped" version of util available for the kernel, that excludes stuff that doesn't match the 5 points above? That way the difference is just at the build system level, it doesn't involve moving files around or moving code between namespaces...

Then the advice would be: (a) just put things in util, (b) but only add things to the "util-core" section of the build file when they're needed by the kernel; (c) keep things in the util:: namespace; (d) split stuff into different .h/.cpp files pretty aggressively?

That would imply a different approach for the util: move fees.h and error.h to common/messages.h commit here.

ajtowns avatar Dec 11 '23 02:12 ajtowns

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849208455

Would it make sense to just merge common into util; but have a "stripped" version of util available for the kernel

Idea to build util and util-core (or util-lite?) libraries is interesting, because then it would be easy to add/remove things from util library depending on whether the kernel needs them, and it would never require renaming anything. I could see this being useful in the future if util library started growing a lot and accumulating more features of over time.

But for now, IMO, just sticking with the current util/common separation and using the cleanup in this PR would organize things better and be less confusing than having two util versions, and it would require moving fewer things. Even though the libraries.md document is currently ambiguous, I think it should be pretty easy to remove the ambiguity here.

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849026409

The criteria I propose for evaluating if modules should remain in util after this PR are:

I think these are good rules if you replace "basic data structure utilities" with "basic utilities that to do not require outside dependencies", and also drop the 3rd rule. It seems like with the 3rd rule you are trying to make a distinction between "data structure utilities" and "system-related utilities", and I don't think it helps to draw a dividing line there.

ryanofsky avatar Dec 11 '23 16:12 ryanofsky

Added 1 commits b23409474f6fe874a25374bb978db9e5682954e6 -> 6d8cb4d4cd6062ad06cd351439e9139e028995d3 (pr/rmutil.5 -> pr/rmutil.6, compare) updating libraries.md to describe differences between util and common libraries better.

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1849026409

My work on the kernel library has been focusing on reducing its essence to validation code, pruning external dependencies, and stripping out as much system related functionality as possible. [...] In my opinion, making the util library a proper dependency of the kernel should not introduce new cross-platform system-related content into the kernel library.

Just to address these points directly, I think the goals of making the kernel portable and working on any platform, and making the kernel minimal working and in constrained environments are great goals. I just don't you need to rm src/util/sock.cpp to achieve these goals. As long as the util library is distributed as a static library, or built as a static library and linked into a dynamic library, the socket functions will not be part of resulting executables if they are not called. And if the kernel is being built for a platform which doesn't support sockets at all, src/util/sock.cpp can just be stubbed or not compiled at all and there should be no problem for portability either. I do think it's good to keep the util library small and lightweight and remove things which don't fit there. But I don't think it needs to be stripped down in a more extreme way.

ryanofsky avatar Dec 11 '23 18:12 ryanofsky

Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1850626464

And if the kernel is being built for a platform which doesn't support sockets at all, src/util/sock.cpp can just be stubbed or not compiled at all and there should be no problem for portability either.

One of the guiding principles for the kernel project so far has been to prohibit future re-couplings of decoupled modules. The tracking issue says "Any further coupling of our consensus engine with non-consensus modules will result in linker errors, preventing this project from becoming a Sisyphean task of battling coupling regressions.". How could this fit in with leaving unused content in the library?

sedited avatar Dec 11 '23 20:12 sedited

One of the guiding principles for the kernel project so far has been to prohibit future re-couplings of decoupled modules.

I reread the issue you linked to, but I don't understand what it implies here. I do think you can rely on linker errors to ensure kernel code isn't calling wallet code or GUI code or P2P code or storing state in global variables. But what else do you want beyond that? And how does removing sock.cpp from the util library help, for example?

ryanofsky avatar Dec 11 '23 22:12 ryanofsky

Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1850970368

But what else do you want beyond that? And how does removing sock.cpp from the util library help, for example?

Maybe removing sock.cpp is a bad example, since it is already impossible to link against it. sock.cpp uses modules that are already not part of the kernel, so trying to link it in will result in errors. Then again, if sock.cpp uses common modules, why is it in util in the first place? To provide another concrete example, there is thread.cpp, which I was hoping to remove from the kernel in #28960. If it is removed, how could the kernel library be guarded from its future re-introduction if it remains in the util library?

sedited avatar Dec 12 '23 08:12 sedited

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1851506064

Maybe removing sock.cpp is a bad example, since it is already impossible to link against it. sock.cpp uses modules that are already not part of the kernel, so trying to link it in will result in errors. Then again, if sock.cpp uses common modules, why is it in util in the first place?

I'm actually not sure what is this referring to. Running nm ./src/util/libbitcoin_util_a-sock.o --undefined-only I see sock.cpp is using LogInstance, NodeClock, and SysErrorString which should all be part of the util library.

To provide another concrete example, there is thread.cpp, which I was hoping to remove from the kernel in #28960. If it is removed, how could the kernel library be guarded from its future re-introduction if it remains in the util library?

thread.h contains TraceThread which seems like a nice utility for the naming threads (which is helpful for monitoring in top) and printing exceptions (which is useful for debugging). I guess kernel code isn't currently using this function (or #28960 removes the last usage), but as long as kernel code is calling threads, what benefit is to preventing kernel code from calling this function? It seems like a useful function, and if anything more code should use it, I would think.

I do agree with goals of making the kernel small and portable, and agree we should organize libraries with that in mind. I also think we should try to trigger link errors to enforce API boundaries. It just seems to me we can do these things without moving a lot around.

ryanofsky avatar Dec 12 '23 19:12 ryanofsky

Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-1852688371

I'm actually not sure what is this referring to. Running nm ./src/util/libbitcoin_util_a-sock.o --undefined-only I see sock.cpp is using LogInstance, NodeClock, and SysErrorString which should all be part of the util library.

Thanks for actually verifying, I just assumed that the common/system.hinclude was actually used :(.

sedited avatar Dec 12 '23 19:12 sedited

Concept ACK. Moving things out of util that the kernel will / should never need, makes sense to me. I don't have strong feelings on where exactly to move these things to.

Sjors avatar Feb 05 '24 13:02 Sjors

Concept ACK

sedited avatar Feb 11 '24 10:02 sedited

Rebased 486379c10fdce568a64b78cbb2dad1fa52263a79 -> ffcd3068359ea35f3b7c491df866a0d6924e3b37 (pr/rmutil.10 -> pr/rmutil.11, compare) due to conflict with #28960

ryanofsky avatar Mar 11 '24 17:03 ryanofsky

I've tested 5eeafeb302dcf56dd5c16c1e4785a41a132344c7 on Ubuntu 23.10.

While all changes look correct and reasonable to me, I was expecting that this PR removes some of dependencies mentioned in https://github.com/bitcoin/bitcoin/issues/28548.

But that is not the case. The libbitcoin_util still depends on libbitcoin_crypto, libbitcoin_consensus and libbitcoin_common libraries.

If such an outcome is expected, maybe update the doc/design/libraries.md accordingly?

hebasto avatar Apr 29 '24 16:04 hebasto

Re https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-2083201174

But that is not the case. The libbitcoin_util still depends on libbitcoin_crypto, libbitcoin_consensus and libbitcoin_common libraries. If such an outcome is expected, maybe update the doc/design/libraries.md accordingly?

I understand this PR more as providing clarity of what should be in the util library and #28690 as actually realizing the diagram.

sedited avatar May 06 '24 09:05 sedited

re: https://github.com/bitcoin/bitcoin/pull/29015#issuecomment-2083201174

The libbitcoin_util still depends on libbitcoin_crypto, libbitcoin_consensus and libbitcoin_common libraries.

Thanks for checking this. Letting the util library depend on the crypto library seems like the easiest path to take because random.cpp, randomenv.cpp, hasher.cpp, and bytevectorhash.cpp are using the crypto hash functions. But I added new commits to the PR to remove the util dependency on crypt, and remove consensus and common dependencies on util. I also added a script which turned up a few other other unexpected dependencies (not related to util), which could be addressed separately. I think the script should be able to run in CI, too, but I did not set that up.


Rebased 5eeafeb302dcf56dd5c16c1e4785a41a132344c7 -> 6ea5ee5a268cfbb9cd234e0f1381c8a4785ff04c (pr/rmutil.13 -> pr/rmutil.14, compare) adding dependency checking script and removing unexpected dependencies related to util

ryanofsky avatar May 10 '24 21:05 ryanofsky

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24838365077

DrahtBot avatar May 10 '24 21:05 DrahtBot

I also added a script which turned up a few other other unexpected dependencies (not related to util), which could be addressed separately.

It would be great to have such a tool at our disposal.

hebasto avatar May 11 '24 17:05 hebasto

There is an error in commit fd342cb40b63ca00d23946743a038847673f8726, which can be fixed with:

diff --git a/src/warnings.cpp b/src/warnings.cpp
index 5b6ace63dd..40043cd8e7 100644
--- a/src/warnings.cpp
+++ b/src/warnings.cpp
@@ -11,0 +12 @@
+#include <util/string.h>

sedited avatar May 12 '24 12:05 sedited

The check script is very useful, could you add it as the first commit? I think it could make review a bit easier.

sedited avatar May 12 '24 12:05 sedited

There is another CI failure in https://cirrus-ci.com/task/6531594385620992:

crypto/.libs/libbitcoin_crypto_base.a(crypto_libbitcoin_crypto_base_la-cleanse.o): in function `memory_cleanse(void*, unsigned long)':
cleanse.cpp:(.text+0x0): multiple definition of `memory_cleanse(void*, unsigned long)'; /usr/bin/ld: DWARF error: invalid or unhandled FORM value: 0x22
support/.libs/libbitcoinkernel_la-cleanse.o:cleanse.cpp:(.text+0x0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)

hebasto avatar May 13 '24 18:05 hebasto

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25060059418

DrahtBot avatar May 16 '24 17:05 DrahtBot

There seems to be a conflict with bitcoin/bitcoin#30098. Maybe rebase?

hebasto avatar May 16 '24 20:05 hebasto

There seems to be a conflict with #30098. Maybe rebase?

Thanks! Rebased now.

If other reviewers or maintainers are not happy with introducing a bash script, I could volunteer some time to translate it.

Yes this could be a good idea. The script started out just being a few lines of bash listing symbols exported by one library and imported by another one. But then it grew as it was extended to support multiple libraries and suppress known errors. So now it's no longer a small script, though it is pretty clean and well commented. Other possible followups besides translating the script could be:

  • Running the script as part of CI
  • Cleaning up unexpected dependencies listed as suppressions in the script.

Rebased 7689dfd6646054a0be8e62ccbf72d5403e28b548 -> 1c26d10b23bb7a7d0204b4a8b44a50fad89f2a2a (pr/rmutil.18 -> pr/rmutil.19, compare) due to silent conflict with #30098

ryanofsky avatar May 17 '24 01:05 ryanofsky