bdk
bdk copied to clipboard
Include the descriptor in `keychain::Changeset`
Fixes #1101
- Moves keychain::ChangeSet inside
keychain/txout_index.rsas now theChangeSetdepends on miniscript - Slightly cleans up tests by introducing some constant descriptors
- The KeychainTxOutIndex's internal SpkIterator now uses DescriptorId instead of K. The DescriptorId -> K translation is made at the KeychainTxOutIndex level.
- The keychain::Changeset is now a struct, which includes a map for last revealed indexes, and one for newly added keychains and their descriptor.
Changelog notice
API changes in bdk:
- Wallet::keychains returns a
impl Iteratorinstead ofBTreeMap - Wallet::load doesn't take descriptors anymore, since they're stored in the db
- Wallet::new_or_load checks if the loaded descriptor from db is the same as the provided one
API changes in bdk_chain:
ChangeSetis now a struct, which includes a map for last revealed indexes, and one for keychains and descriptors.KeychainTxOutIndex::innerreturns aSpkIterator<(DescriptorId, u32)>KeychainTxOutIndex::outpointsreturns aBTreeSetinstead of&BTreeSetKeychainTxOutIndex::keychainsreturns aimpl Iteratorinstead of&BTreeMapKeychainTxOutIndex::txoutsdoesn't return a ExactSizeIterator anymoreKeychainTxOutIndex::last_revealed_indicesreturns aBTreeMapinstead of&BTreeMapKeychainTxOutIndex::add_keychainhas been renamed toKeychainTxOutIndex::insert_descriptor, and now it returns a ChangeSetKeychainTxOutIndex::reveal_next_spkreturns OptionKeychainTxOutIndex::next_unused_spkreturns OptionKeychainTxOutIndex::unbounded_spk_iterreturns OptionKeychainTxOutIndex::next_indexreturns OptionKeychainTxOutIndex::reveal_to_targetreturns OptionKeychainTxOutIndex::revealed_keychain_spksreturns OptionKeychainTxOutIndex::unused_keychain_spksreturns OptionKeychainTxOutIndex::last_revealed_indexreturns OptionKeychainTxOutIndex::keychain_outpointsreturns OptionKeychainTxOutIndex::keychain_outpoints_in_rangereturns OptionKeychainTxOutIndex::last_used_indexreturns None if the keychain has never been used, or if it doesn't exist
Checklists
All Submissions:
- [x] I've signed all my commits
- [x] I followed the contribution guidelines
- [x] I ran
cargo fmtandcargo clippybefore committing
New Features:
- [x] I've added tests for the new feature
- [x] I've added docs for the new feature
Oh and it seems like you need to rebase.
Hey @evanlinjin and @LLFourn, thanks a lot for your reviews. I've given some thoughts about them:
- I agree that having a BTreeMap with everything inside is not optimal, and a structure with both last_revealed and descriptor would be better. That being said, we need to make sure that users don't pass in a (keychain, index) without the correspondent descriptor.
- The more I think about it, the more I convince myself that we shouldn't let the user modify the descriptor - otherwise, what's the point of having a db checksum? I think that changing it silently (as my code is doing right now), or making the changeset monotone as lloyd suggested, might be potential footguns. (I might not have fully comprehended lloyd's proposal though!)
It seems to me that it would be optimal to have append return a Result, and error in the two cases; but the Append trait doesn't specify any return value for append. How should I go about this?
Just quoting @LLFourn's response to adding an error to Append trait methods:
Because we should design things so that there should be no possible error
The errors would have to be programmer errors which should be handled via panic or just carry on with best effort.
I think there would most of the time be ways to re-architect the changeset to avoid errors completely. That's the trade off. Either design it so there can't be errors (may mean bigger more monotone changesets) or just accept the error.
What do we think about using debug_assert! in the Append implementation for now? This will check if we are mutating the descriptor of an already-existing keychain. In release mode, the consecutive changes to a keychain's descriptor should just be ignored.
The add_keychain method should error if a user is trying to modify an already-existing keychain. The changesets are returned by KeychainTxOutIndex methods. The KeychainTxOutIndex should never create consecutive changesets that modify the same keychain's descriptor. So I think my suggestion is fair.
I've been trying to think of how to keep
ChangeSet<K>monotone and I think we should not change the keychain::ChangeSet and instead enforce that descriptors don't change at the bdk::wallet::ChangeSet level. This puts responsibility on the devs who who don't use bdk::wallet to keep their data consistent. They can either follow how we do it inWalletor go their own way if they know what they're doing. I created #1234 to demonstrate this approach, would love some feed back.
This is actually becoming my intuition too but have an idea a bit different #1234. I think the descriptor has to be in the keychain changeset because that's where it's stored in memory and it has to be there to produce new addresses. Having it elsewhere adds friction and removing that is the point of this effort.
We can keep KeychainTxOutIndex simple and keep changesets montone by removing the type parameter K from it. There is a perfectly coherent bit of functionality with its own persistence mechanism that cannot conflict with anything else that we can carve out here and leave labeling to later. For now we can let Wallet handle its own labeling with the KeychainKind enum in its own addition to its changeset.
So here's my suggestion:
- Use descriptor hash to identify descriptor (as @notmandatory does in his approach). Let's call it
DescriptorIdand make ittype DescriptorId = sha256::Hashfor now. - Rename
KeychainTxOutIndextoDescriptorTxOutIndex - Define it as follows:
#[derive(Clone, Debug)]
pub struct DescriptorTxOutIndex {
inner: SpkTxOutIndex<(DescriptorId, u32)>,
// descriptors of each keychain
descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
// last revealed indexes
last_revealed: BTreeMap<DescriptorId, u32>,
// lookahead settings for each descriptor
lookahead: BTreeMap<DescriptorId, u32>,
}
#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet {
revealed: BTreeMap<DescriptorId, u32>
added: BTreeSet<Descriptor<DescriptorPublicKey>
}
Methods in DescriptorTxOutIndex should take a DescriptorId.
This changeset is monotone and error free and gives us most of the useful things that KeychainTxOutIndex did. We could of course keep KeychainTxOutIndex and implement with a DescriptorTxOutIndex internally but I think we should just do Wallet first and add that later if we want it. I guess the examples won't be too hard to change (it might even simplify them a bit!).
So here's what Wallet changeset would become:
/// The changes made to a wallet by applying an [`Update`].
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize, Default)]
pub struct ChangeSet {
/// Changes to the [`LocalChain`].
///
/// [`LocalChain`]: local_chain::LocalChain
pub chain: local_chain::ChangeSet,
/// Changes to [`IndexedTxGraph`].
///
/// [`IndexedTxGraph`]: bdk_chain::indexed_tx_graph::IndexedTxGraph
pub indexed_tx_graph: indexed_tx_graph::ChangeSet<
ConfirmationTimeHeightAnchor,
descriptor::ChangeSet,
>,
/// Stores they keychains of the wallet and their descriptors.
/// This should only be non-empty in the first changeset of the wallet.
keychains: BTreeMap<KeychainKind, DescriptorId>.
/// Stores the network type of the wallet.
pub network: Option<Network>,
}
Of course this is non-monotone but it's better to have these hacks up here than down in bdk_chain. Since our API doesn't allow you to mutate descriptors it could also be fixed by #1193 i.e. setting keychains and network in the header. There is a question about whether keychains should have a DescriptorId or a Descriptor in it. If you use DescriptorId you always have to make sure the Descriptor is in the index and have an error case where a DescriptorId gets added to keychains but not indexed_tx_graph. Using the Descriptor in keychains redundantly stores the descriptors but remove the error case. I think I lean to just have DescriptorId since I don't see it causing too much trouble.
We'll probably want to add a descriptor_id() to DescriptorExt in bdk_chain.
Thoughts?
Thoughts?
So after sleeping on it I have changed my view a little bit. The design above is the most in line with the overall design philosophy and removes all possible invalid representations at the bdk_chain level. But it is a very nice feature to label descriptors and we use it in examples and almost every user will want to attach some kind of application semantics to the descriptor. So I want to modify the above proposal to keep KeychainTxOutIndex but have it internally function mostly like what was suggested above so that we remove most of the surface area for footguns.
#[derive(Clone, Debug)]
pub struct KeychainTxOutIndex<K> {
inner: SpkTxOutIndex<(DescriptorId, u32)>,
// descriptors of each keychain
descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
// last revealed indexes
last_revealed: BTreeMap<DescriptorId, u32>,
// lookahead settings for each descriptor
lookahead: BTreeMap<DescriptorId, u32>,
keychains: BTreeMap<K, DescriptorId>
}
#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet<K> {
revealed: BTreeMap<DescriptorId, u32>,
keychains_added: BTreeMap<K, Descriptor<DescriptorPublicKey>
}
This means the non-montonicity is restricted to keychains_added field where we can implement keychains_added by just replacing descriptors with the most recent one (i.e. just use BTreeMap::append). Replacing a keychain will not be a permanent break of anything -- You could just re-replace it with the correct one and all the other values like last_revealed would return to their correct values. In fact this even makes it possible to replace a keychain with another descriptor while not losing the old coins. Since the last revealed is associated with the descriptor rather than keychain it will also always load all the old spks. So this turns a bug into a feature in a relatively safe way -- it's probably not a feature that you should ever use though!
@LLFourn Thanks for the detailed explanation. It helped clarify for me why persisting descriptor data at the wallet level is perhaps not enough. I think this direction is well aligned with the related effort of separating out a "header" changeset.
~~I'll take a crack at implementing your new KeychainTxOutIndex<K> design, but if @danielabrozzoni gets to it first in this PR that's fine too. At least I'll be a better reviewer if I play with it a bit.~~ @danielabrozzoni has this one, I'll still help review.
Please rebase now that #1183 has been merged.
@LLFourn very nice!
I fixed some new CI issues with #1247, it just needs to be ACKd and merged and then this PR rebased.
@LLFourn @evanlinjin do you think it's ok if KeychainTxOutIndex::keychains returns a BTreeMap<&K, &Descriptor<DescriptorPublicKey>> instead of &BTreeMap<K, Descriptor<DescriptorPublicKey>>?
The thing is, after refactoring the KeychainTxOutIndex, I don't have a keychain -> desc map that I can return a reference to. Returning a BTreeMap<&..., &...> is the best idea that come to my mind, other possibilities are returning a BTreeMap<K, Desc> (seems wasteful) or keeping in the keychain index the BTreeMap<K, Desc> (unclean, but maybe it's the same case as TxGraph::empty_outspends?)
@danielabrozzoni personally I would prefer returning an impl Iterator<Item = (K, &Desc)>. What do you think?
@evanlinjin I see how Iter<...> is better than BTreeMap<...> since you don't need to collect, but why returning K instead of &K?
&K is fine to me
Pushed an update. It still needs a rebase, but while I work on it, can you let me know what you think?
@evanlinjin I see how Iter<...> is better than BTreeMap<...> since you don't need to collect, but why returning
Kinstead of&K?
I would assume for the majority of cases, K is small (as an identifier) so cloning/copying would be cheaper than working with references. Therefore I prefer not having references on `K.
In my last push:
- Rebase on master
- Fix (almost all the) review comments - I'm still missing a couple, will take a look soon
revealed_keychain_spks,unused_keychain_spkskeychain_outpoints_in_rangeandkeychain_outpointsreturn an iterator, and would previously return an empty iterator if the keychain passed in didn't exist. The code is updated to instead panic if the keychain passed in doesn't exist. I was hoping to keep the same functionality in this PR, and maybe change the behavior in a new PR if we think it's the right choice (more here: #1340), but I can't really find a way to do so. Any suggestions? Should we take a decision on #1340 first, and then implement it here, or is it ok to merge and fix in a follow-up PR (hopefully before the release)?- The new
KeychainTxOutIndex::add_keychainalways adds keychains, overwriting old descriptor/keychains when needed. This causes the (rather strange) behavior shown intest_insert_different_desc_same_keychain:- The index contains (K1, D1) and (K2, D2)
- (K1, D2) is insterted
- The index now contains (K1, D2) and (K2, D2) I'm not sure if this is the behavior we want, or if instead, the index should only contain (K1, D2). From my understanding, since APIs never take in a DescriptorId, but always a Keychain, it shouldn't be a problem in any way.
Maybe KeychainTxOutIndex should have its own Error enum, to catch mistakes early instead of returning results that don't make sense
@ValuedMammal I agree, but it seems to me that there aren't many methods in bdk_chain that return Result, and this seems a deliberate design choice? Also, "having an error-free API" was explicitly stated in the docs (which I had to remove in #1188, when I realized that some methods did return Results)
@LLFourn can you shed some light on this?
While returning Option is effectively the same as calling next on an empty iter, it would do a better job of informing the caller that the keychain doesn't exist - without introducing errors. You could use an if-let in places where descriptor_of_keychain is called and simply return None if the keychain isn't there.
FYI I'm not including this PR in #1343, but looks like it will be ready for the next alpha release.
I would assume for the majority of cases,
Kis small (as an identifier) so cloning/copying would be cheaper than working with references. Therefore I prefer not having references on `K.
I think the rust-centric policy is if we don't want to enforce K: Copy then it should be &K must of the time. Unfortunately, I don't think we do want K: Copy so you can use String etc as keychains.
BTW I had a go at doing an emoji review with this one. Let me know if it helps.
This is ready for another round :tada:
BTW I had a go at doing an emoji review with this one. Let me know if it helps.
That was actually really helpful!
Flesh out DescriptorId type. I think it should be DescriptorId([u8;32]). It should encode/decode to hex in Display/FromStr/Debug etc and human readable serde impls.
@LLFourn I went with using https://docs.rs/bitcoin_hashes/0.13.0/bitcoin_hashes/macro.hash_newtype.html, so DescriptorId still contains Hash instead of [u32;8], I guess that's also acceptable?
In my last push:
- Rebase
- Fix review comments
- Removed panics (we return
Optionbasically everywhere now) - Add
map_keychaininWallet::unbounded_spk_iterfor consistency with other methods
Reviewers please specify for any outstanding suggestions if they are a nit ⛏️ , can be fixed later 📌 or must fix 🔧 concern. This PR is five months old and we need to wrap it up!
Everything that I have not specified as a nit, or asked as a question needs to be fixed now.
@danielabrozzoni let me know if you need a hand with this
@evanlinjin thanks! Right now I'm going through comments and updating/replying, I'll let you know if I need anything :)
My last push includes a rebase and a fix for almost all the review comments.
Left to do:
- https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1510281852 I need some input from @evanlinjin, I don't understand why this is needed/better
- https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1510207238 @evanlinjin I ported your tests from https://github.com/danielabrozzoni/bdk/pull/6/commits/8229a082c9de879abc3ba5b1e3540d34f0641471, but I realized the only way for them to pass is to add a
desc_id -> descmap, which I know you're not a fan of. The reason for this is thatreveal_to_target_with_idneeds to know the descriptor given the desc_id, which we don't know if the keychain isn't present (we cleanself.descriptor_ids_to_keychainininsert_descriptor). Let me know if you find any other possible solution! - Address https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1566263753 - thanks @notmandatory for the fix! I want to cherry pick it + add some docs to the load method