bdk Include the descriptor in `keychain::Changeset`

trafficstars

Fixes #1101

Moves keychain::ChangeSet inside keychain/txout_index.rs as now the ChangeSet depends on miniscript
Slightly cleans up tests by introducing some constant descriptors
The KeychainTxOutIndex's internal SpkIterator now uses DescriptorId instead of K. The DescriptorId -> K translation is made at the KeychainTxOutIndex level.
The keychain::Changeset is now a struct, which includes a map for last revealed indexes, and one for newly added keychains and their descriptor.

Changelog notice

API changes in bdk:

Wallet::keychains returns a impl Iterator instead of BTreeMap
Wallet::load doesn't take descriptors anymore, since they're stored in the db
Wallet::new_or_load checks if the loaded descriptor from db is the same as the provided one

API changes in bdk_chain:

ChangeSet is now a struct, which includes a map for last revealed indexes, and one for keychains and descriptors.
KeychainTxOutIndex::inner returns a SpkIterator<(DescriptorId, u32)>
KeychainTxOutIndex::outpoints returns a BTreeSet instead of &BTreeSet
KeychainTxOutIndex::keychains returns a impl Iterator instead of &BTreeMap
KeychainTxOutIndex::txouts doesn't return a ExactSizeIterator anymore
KeychainTxOutIndex::last_revealed_indices returns a BTreeMap instead of &BTreeMap
KeychainTxOutIndex::add_keychain has been renamed to KeychainTxOutIndex::insert_descriptor, and now it returns a ChangeSet
KeychainTxOutIndex::reveal_next_spk returns Option
KeychainTxOutIndex::next_unused_spk returns Option
KeychainTxOutIndex::unbounded_spk_iter returns Option
KeychainTxOutIndex::next_index returns Option
KeychainTxOutIndex::reveal_to_target returns Option
KeychainTxOutIndex::revealed_keychain_spks returns Option
KeychainTxOutIndex::unused_keychain_spks returns Option
KeychainTxOutIndex::last_revealed_index returns Option
KeychainTxOutIndex::keychain_outpoints returns Option
KeychainTxOutIndex::keychain_outpoints_in_range returns Option
KeychainTxOutIndex::last_used_index returns None if the keychain has never been used, or if it doesn't exist

Checklists

All Submissions:

[x] I've signed all my commits
[x] I followed the contribution guidelines
[x] I ran cargo fmt and cargo clippy before committing

New Features:

[x] I've added tests for the new feature
[x] I've added docs for the new feature

Nov 10 '23 16:11 danielabrozzoni

Oh and it seems like you need to rebase.

Nov 14 '23 23:11 evanlinjin

Hey @evanlinjin and @LLFourn, thanks a lot for your reviews. I've given some thoughts about them:

I agree that having a BTreeMap with everything inside is not optimal, and a structure with both last_revealed and descriptor would be better. That being said, we need to make sure that users don't pass in a (keychain, index) without the correspondent descriptor.
The more I think about it, the more I convince myself that we shouldn't let the user modify the descriptor - otherwise, what's the point of having a db checksum? I think that changing it silently (as my code is doing right now), or making the changeset monotone as lloyd suggested, might be potential footguns. (I might not have fully comprehended lloyd's proposal though!)

It seems to me that it would be optimal to have append return a Result, and error in the two cases; but the Append trait doesn't specify any return value for append. How should I go about this?

Nov 29 '23 18:11 danielabrozzoni

Just quoting @LLFourn's response to adding an error to Append trait methods:

Because we should design things so that there should be no possible error

The errors would have to be programmer errors which should be handled via panic or just carry on with best effort.

I think there would most of the time be ways to re-architect the changeset to avoid errors completely. That's the trade off. Either design it so there can't be errors (may mean bigger more monotone changesets) or just accept the error.

Dec 05 '23 06:12 evanlinjin

What do we think about using debug_assert! in the Append implementation for now? This will check if we are mutating the descriptor of an already-existing keychain. In release mode, the consecutive changes to a keychain's descriptor should just be ignored.

The add_keychain method should error if a user is trying to modify an already-existing keychain. The changesets are returned by KeychainTxOutIndex methods. The KeychainTxOutIndex should never create consecutive changesets that modify the same keychain's descriptor. So I think my suggestion is fair.

Dec 05 '23 06:12 evanlinjin

I've been trying to think of how to keep ChangeSet<K> monotone and I think we should not change the keychain::ChangeSet and instead enforce that descriptors don't change at the bdk::wallet::ChangeSet level. This puts responsibility on the devs who who don't use bdk::wallet to keep their data consistent. They can either follow how we do it in Wallet or go their own way if they know what they're doing. I created #1234 to demonstrate this approach, would love some feed back.

This is actually becoming my intuition too but have an idea a bit different #1234. I think the descriptor has to be in the keychain changeset because that's where it's stored in memory and it has to be there to produce new addresses. Having it elsewhere adds friction and removing that is the point of this effort.

We can keep KeychainTxOutIndex simple and keep changesets montone by removing the type parameter K from it. There is a perfectly coherent bit of functionality with its own persistence mechanism that cannot conflict with anything else that we can carve out here and leave labeling to later. For now we can let Wallet handle its own labeling with the KeychainKind enum in its own addition to its changeset.

So here's my suggestion:

Use descriptor hash to identify descriptor (as @notmandatory does in his approach). Let's call it DescriptorId and make it type DescriptorId = sha256::Hash for now.
Rename KeychainTxOutIndex to DescriptorTxOutIndex
Define it as follows:

#[derive(Clone, Debug)]
pub struct DescriptorTxOutIndex {
    inner: SpkTxOutIndex<(DescriptorId, u32)>,
    // descriptors of each keychain
    descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
    // last revealed indexes
    last_revealed: BTreeMap<DescriptorId, u32>,
    // lookahead settings for each descriptor
    lookahead: BTreeMap<DescriptorId, u32>,
}

#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet {
    revealed: BTreeMap<DescriptorId, u32>
    added: BTreeSet<Descriptor<DescriptorPublicKey>
}

Methods in DescriptorTxOutIndex should take a DescriptorId.

This changeset is monotone and error free and gives us most of the useful things that KeychainTxOutIndex did. We could of course keep KeychainTxOutIndex and implement with a DescriptorTxOutIndex internally but I think we should just do Wallet first and add that later if we want it. I guess the examples won't be too hard to change (it might even simplify them a bit!).

So here's what Wallet changeset would become:

/// The changes made to a wallet by applying an [`Update`].
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize, Default)]
pub struct ChangeSet {
    /// Changes to the [`LocalChain`].
    ///
    /// [`LocalChain`]: local_chain::LocalChain
    pub chain: local_chain::ChangeSet,

    /// Changes to [`IndexedTxGraph`].
    ///
    /// [`IndexedTxGraph`]: bdk_chain::indexed_tx_graph::IndexedTxGraph
    pub indexed_tx_graph: indexed_tx_graph::ChangeSet<
        ConfirmationTimeHeightAnchor,
        descriptor::ChangeSet,
    >,

    /// Stores they keychains of the wallet and their descriptors.
    /// This should only be non-empty in the first changeset of the wallet.
    keychains: BTreeMap<KeychainKind, DescriptorId>.

    /// Stores the network type of the wallet.
    pub network: Option<Network>,   
}

Of course this is non-monotone but it's better to have these hacks up here than down in bdk_chain. Since our API doesn't allow you to mutate descriptors it could also be fixed by #1193 i.e. setting keychains and network in the header. There is a question about whether keychains should have a DescriptorId or a Descriptor in it. If you use DescriptorId you always have to make sure the Descriptor is in the index and have an error case where a DescriptorId gets added to keychains but not indexed_tx_graph. Using the Descriptor in keychains redundantly stores the descriptors but remove the error case. I think I lean to just have DescriptorId since I don't see it causing too much trouble.

We'll probably want to add a descriptor_id() to DescriptorExt in bdk_chain.

Thoughts?

Dec 07 '23 07:12 LLFourn

Thoughts?

So after sleeping on it I have changed my view a little bit. The design above is the most in line with the overall design philosophy and removes all possible invalid representations at the bdk_chain level. But it is a very nice feature to label descriptors and we use it in examples and almost every user will want to attach some kind of application semantics to the descriptor. So I want to modify the above proposal to keep KeychainTxOutIndex but have it internally function mostly like what was suggested above so that we remove most of the surface area for footguns.

#[derive(Clone, Debug)]
pub struct KeychainTxOutIndex<K> {
    inner: SpkTxOutIndex<(DescriptorId, u32)>,
    // descriptors of each keychain
    descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
    // last revealed indexes
    last_revealed: BTreeMap<DescriptorId, u32>,
    // lookahead settings for each descriptor
    lookahead: BTreeMap<DescriptorId, u32>,

    keychains: BTreeMap<K, DescriptorId> 
}

#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet<K> {
    revealed: BTreeMap<DescriptorId, u32>,
    keychains_added: BTreeMap<K, Descriptor<DescriptorPublicKey>
}

This means the non-montonicity is restricted to keychains_added field where we can implement keychains_added by just replacing descriptors with the most recent one (i.e. just use BTreeMap::append). Replacing a keychain will not be a permanent break of anything -- You could just re-replace it with the correct one and all the other values like last_revealed would return to their correct values. In fact this even makes it possible to replace a keychain with another descriptor while not losing the old coins. Since the last revealed is associated with the descriptor rather than keychain it will also always load all the old spks. So this turns a bug into a feature in a relatively safe way -- it's probably not a feature that you should ever use though!

Dec 08 '23 00:12 LLFourn

@LLFourn Thanks for the detailed explanation. It helped clarify for me why persisting descriptor data at the wallet level is perhaps not enough. I think this direction is well aligned with the related effort of separating out a "header" changeset.

Dec 08 '23 02:12 ValuedMammal

~~I'll take a crack at implementing your new KeychainTxOutIndex<K> design, but if @danielabrozzoni gets to it first in this PR that's fine too. At least I'll be a better reviewer if I play with it a bit.~~ @danielabrozzoni has this one, I'll still help review.

Dec 08 '23 05:12 notmandatory

Please rebase now that #1183 has been merged.

Dec 12 '23 03:12 notmandatory

@LLFourn very nice!

Dec 15 '23 11:12 evanlinjin

I fixed some new CI issues with #1247, it just needs to be ACKd and merged and then this PR rebased.

Dec 20 '23 17:12 notmandatory

@LLFourn @evanlinjin do you think it's ok if KeychainTxOutIndex::keychains returns a BTreeMap<&K, &Descriptor<DescriptorPublicKey>> instead of &BTreeMap<K, Descriptor<DescriptorPublicKey>>?

The thing is, after refactoring the KeychainTxOutIndex, I don't have a keychain -> desc map that I can return a reference to. Returning a BTreeMap<&..., &...> is the best idea that come to my mind, other possibilities are returning a BTreeMap<K, Desc> (seems wasteful) or keeping in the keychain index the BTreeMap<K, Desc> (unclean, but maybe it's the same case as TxGraph::empty_outspends?)

Jan 22 '24 17:01 danielabrozzoni

@danielabrozzoni personally I would prefer returning an impl Iterator<Item = (K, &Desc)>. What do you think?

Jan 22 '24 22:01 evanlinjin

@evanlinjin I see how Iter<...> is better than BTreeMap<...> since you don't need to collect, but why returning K instead of &K?

Jan 24 '24 16:01 danielabrozzoni

&K is fine to me

Jan 29 '24 02:01 LLFourn

Pushed an update. It still needs a rebase, but while I work on it, can you let me know what you think?

Jan 31 '24 18:01 danielabrozzoni

@evanlinjin I see how Iter<...> is better than BTreeMap<...> since you don't need to collect, but why returning K instead of &K?

I would assume for the majority of cases, K is small (as an identifier) so cloning/copying would be cheaper than working with references. Therefore I prefer not having references on `K.

Feb 01 '24 17:02 evanlinjin

In my last push:

Rebase on master
Fix (almost all the) review comments - I'm still missing a couple, will take a look soon
revealed_keychain_spks, unused_keychain_spks keychain_outpoints_in_range and keychain_outpoints return an iterator, and would previously return an empty iterator if the keychain passed in didn't exist. The code is updated to instead panic if the keychain passed in doesn't exist. I was hoping to keep the same functionality in this PR, and maybe change the behavior in a new PR if we think it's the right choice (more here: #1340), but I can't really find a way to do so. Any suggestions? Should we take a decision on #1340 first, and then implement it here, or is it ok to merge and fix in a follow-up PR (hopefully before the release)?
The new KeychainTxOutIndex::add_keychain always adds keychains, overwriting old descriptor/keychains when needed. This causes the (rather strange) behavior shown in test_insert_different_desc_same_keychain:
- The index contains (K1, D1) and (K2, D2)
- (K1, D2) is insterted
- The index now contains (K1, D2) and (K2, D2) I'm not sure if this is the behavior we want, or if instead, the index should only contain (K1, D2). From my understanding, since APIs never take in a DescriptorId, but always a Keychain, it shouldn't be a problem in any way.

Feb 13 '24 12:02 danielabrozzoni

Maybe KeychainTxOutIndex should have its own Error enum, to catch mistakes early instead of returning results that don't make sense

Feb 14 '24 16:02 ValuedMammal

@ValuedMammal I agree, but it seems to me that there aren't many methods in bdk_chain that return Result, and this seems a deliberate design choice? Also, "having an error-free API" was explicitly stated in the docs (which I had to remove in #1188, when I realized that some methods did return Results)

@LLFourn can you shed some light on this?

Feb 14 '24 18:02 danielabrozzoni

While returning Option is effectively the same as calling next on an empty iter, it would do a better job of informing the caller that the keychain doesn't exist - without introducing errors. You could use an if-let in places where descriptor_of_keychain is called and simply return None if the keychain isn't there.

Feb 14 '24 18:02 ValuedMammal

FYI I'm not including this PR in #1343, but looks like it will be ready for the next alpha release.

Feb 15 '24 16:02 notmandatory

I would assume for the majority of cases, K is small (as an identifier) so cloning/copying would be cheaper than working with references. Therefore I prefer not having references on `K.

I think the rust-centric policy is if we don't want to enforce K: Copy then it should be &K must of the time. Unfortunately, I don't think we do want K: Copy so you can use String etc as keychains.

Feb 16 '24 01:02 LLFourn

BTW I had a go at doing an emoji review with this one. Let me know if it helps.

Feb 16 '24 05:02 LLFourn

This is ready for another round :tada:

BTW I had a go at doing an emoji review with this one. Let me know if it helps.

That was actually really helpful!

Flesh out DescriptorId type. I think it should be DescriptorId([u8;32]). It should encode/decode to hex in Display/FromStr/Debug etc and human readable serde impls.

@LLFourn I went with using https://docs.rs/bitcoin_hashes/0.13.0/bitcoin_hashes/macro.hash_newtype.html, so DescriptorId still contains Hash instead of [u32;8], I guess that's also acceptable?

In my last push:

Rebase
Fix review comments
Removed panics (we return Option basically everywhere now)
Add map_keychain in Wallet::unbounded_spk_iter for consistency with other methods

Feb 26 '24 14:02 danielabrozzoni

Reviewers please specify for any outstanding suggestions if they are a nit ⛏️ , can be fixed later 📌 or must fix 🔧 concern. This PR is five months old and we need to wrap it up!

Mar 21 '24 20:03 notmandatory

Everything that I have not specified as a nit, or asked as a question needs to be fixed now.

Mar 23 '24 13:03 evanlinjin

@danielabrozzoni let me know if you need a hand with this

Apr 18 '24 05:04 evanlinjin

@evanlinjin thanks! Right now I'm going through comments and updating/replying, I'll let you know if I need anything :)

Apr 18 '24 13:04 danielabrozzoni

My last push includes a rebase and a fix for almost all the review comments.

Left to do:

https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1510281852 I need some input from @evanlinjin, I don't understand why this is needed/better
https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1510207238 @evanlinjin I ported your tests from https://github.com/danielabrozzoni/bdk/pull/6/commits/8229a082c9de879abc3ba5b1e3540d34f0641471, but I realized the only way for them to pass is to add a desc_id -> desc map, which I know you're not a fan of. The reason for this is that reveal_to_target_with_id needs to know the descriptor given the desc_id, which we don't know if the keychain isn't present (we clean self.descriptor_ids_to_keychain in insert_descriptor). Let me know if you find any other possible solution!
Address https://github.com/bitcoindevkit/bdk/pull/1203#discussion_r1566263753 - thanks @notmandatory for the fix! I want to cherry pick it + add some docs to the load method

Apr 23 '24 12:04 danielabrozzoni

bdk bdk copied to clipboard

Include the descriptor in `keychain::Changeset`

Changelog notice

Checklists

All Submissions:

New Features:

bdk
bdk copied to clipboard