bdk Backwards compatibility of `bdk::ChangeSet` data structure

Describe the enhancement

The bdk_wallet::ChangeSet is what the persistence interfaces with. As suggested by @tnull, this structure should be backwards compatible (in case we make changes to this structure).

An easy solution is to make bdk_wallet::ChangeSet a enum:

#[non_exhaustive]
pub enum ChangeSet {
    V0(ChangeSet0),
    V1(ChangeSet1), // in the future this may exist
}

// We always convert to the latest version
impl From<ChangeSet0> for ChangeSet1 { /* TODO */ }

Aug 29 '23 05:08 evanlinjin

Note that it would also be nice to consider whether even forwards compatibility could be guaranteed. It may make things a bit more complicated, but it's worth exploring, especially since these properties are transitive for any downstream users (e.g., LDK Node can only guarantee what BDK guarantees...).

An easy solution is to make bdk::ChangeSet a enum:

Mh, but then you'd need to define explicit migration functions to allow user migrating from V0 to V1? Another way would be to a) don't break compat for any existing fields and b) introduce any new fields as Options so that users upgrading could deal with 'old' objects in which the fields are still None.

Aug 29 '23 09:08 tnull

@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you

Aug 31 '23 03:08 evanlinjin

@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you

Forwards compatibility is really useful to allow downgrades for various reasons: say a user upgrades to a newer BDK version and then a bug is discovered. If the upgrade breaks serialization forwards compatibility they couldn't downgrade to get going again and would be stuck until BDK ships a fix. Similarly, it would be necessary to be a bit more forgiving when restoring from backups or when having multiple apps that operate on the same data set.

In LDK Node's case we for example plan to allow multi-device support synced via a VSS backend. Not having forwards compatibility of the serialized data would require our users to upgrade all apps and devices in lockstep as otherwise an outdated device would refuse to read any data written by the already upgraded devices.

As said above, I'm aware that it may complicate things, but it's at least worth considering, i.e., it should be a dedicated design choice not to provide these guarantees.

Aug 31 '23 06:08 tnull

Thanks for bringing up this point. I think we should never change this for structures in bdk_chain once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:

#1005
#1101

I'm not sure if we can apply the same policy to bdk::Wallet. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.

I think forwards compatibility is possible but this depends on the application serializing each changeset in a particular way. Note in bdk_file_store we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializing Vec<u8> of the encoded entry which will prefix it with the length.

Sep 01 '23 01:09 LLFourn

Note in bdk_file_store we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializing Vec<u8> of the encoded entry which will prefix it with the length.

Yeah, I think even independently from (forwards) compatibility guarantees, having some TLV-style encoding or at the very least adding the length sounds like a very good idea. Not having this really limits how you can make necessary changes to the data model in the future.

Sep 01 '23 07:09 tnull

Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.

Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new bdk::ChangeSet versions.

Nov 13 '23 17:11 notmandatory

I added this to the alpha.4 milestone to decide at least if we're going to do this or not since it's a biggish functional change.

Nov 13 '23 17:11 notmandatory

Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.

I mean generally forwards compat. is a "nice-to-have", not a hard requirement. Still, ... nice to have it. ;) And I agree that it doesn't have to be explicit in the data structures (in fact, none of the compat guarantees have to be), but if we want it, it should be clearly defined and documented, in the best case even (regularly) checked.

Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new bdk::ChangeSet versions.

Mh, but this is really about the data model itself, not the persistence layer. Of course the persistence layer could always provide migrations, but the idea compat. guarantees is exactly that the data model does not need an explicit migration step to go forwards/backwards.

Nov 14 '23 08:11 tnull

Thanks for bringing up this point. I think we should never change this for structures in bdk_chain once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:
1. [LocalChain changes should be monotone #1005](https://github.com/bitcoindevkit/bdk/issues/1005)

2. [Reintroduce descriptor data to `bdk::Wallet` persistence #1101](https://github.com/bitcoindevkit/bdk/issues/1101)
I'm not sure if we can apply the same policy to bdk::Wallet. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.

I just realized this suggestion doesn't make sense since we're using bincode. You indeed have to do what @evanlinjin suggested in the OP for backwards compat.

Nov 16 '23 01:11 LLFourn

Lib team call: It changes the API so we should include it before cutting off for beta release

Jan 16 '24 12:01 nondiremanuel

I suggest we go ahead with this and do what is described in the ticket description.

Apr 12 '24 13:04 evanlinjin

We are now using the CombindedChangeSet type for persisting Wallet data and that type implements Default, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added to CombindedChangeSet also implement Default.

So for example, if we add some field like wallet_name: Option<String> to a future version of CombinedChangeSet, then when we read an old data file with the old struct we'll get None and have to ensure the rest of the code will handle it in some reasonable way without a panic.

@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for CombinedChangeSet.

Jun 01 '24 22:06 notmandatory

We are now using the CombindedChangeSet type for persisting Wallet data and that type implements Default, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added to CombindedChangeSet also implement Default.

So for example, if we add some field like wallet_name: Option<String> to a future version of CombinedChangeSet, then when we read an old data file with the old struct we'll get None and have to ensure the rest of the code will handle it in some reasonable way without a panic.

@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for CombinedChangeSet.

Yes, it sounds like this could work, especially if newly added fields are Options and this remains the only type that needs to be persisted. Note that it would still be unclear how you'd handle any types for which a sane default isn't easily available. Might also be worth to add a test for the serialization upgrade whenever changes are made to CombinedChangeSet or any of the dependent types.

Jun 03 '24 08:06 tnull

@notmandatory some serialization formats would not work with what's suggested (i.e. bincode) when the entire changeset is persisted. I think we should create a new type (i.e. VersionedChangeSet) which is an enum of different versions of CombinedChangeSet. The changeset will auto-upgrade itself to the latest version (if that makes sense).

We still need to ensure new fields added have sane defaults (as @tnull pointed out). However, I can't imagine a scenario where something added wouldn't have a sane default.

P.s. I'm working on this right now!

Jun 03 '24 11:06 evanlinjin

Can you elaborate on why serialization schemes like bincode wouldn't work with the idea of making all fields, and new ones in particular, optional (or potentially empty)? doesn't any serialization we use have to have the property that serdes change sets from/to regular rust structs and that can be appended as we do now?

Jun 05 '24 02:06 notmandatory

In theory bincode could work like this if you added a length prefix to each changeset but in practice there doesn't seem to be APIs to "decode this thing and leave the remaining fields as Default if you run out of data" which seems really unfortunate. This stops you doing backwards compatibility without version bytes. Version bytes break forwards compatibility.

I think it would be possible to get both if we were willing to do a custom serde derserializer that did versioning but if the version is higher than the current impl, then just try deserializing anyway but ignore trailing bytes (note that bincode can ignore trailing bytes). This would be a little better since it doesn't require making the enum.

I wonder if there's a way we could do this in bdk file persist where we just do the version prefixing there and you provide a callback to the bdk_file_persist telling it how to decode each older version. This seems like the cleanest because it puts the solution where the actual problem is (bincode) and doesn't preclude forwards compatibility for other formats (e.g. cbor, json etc). This would turn it into a TLV kind of thing where each entry is in the form:

struct Entry<V> {
    version: V,
    data: Vec<u8>,
}

Then you'd provide a callback like:

Fn(Option<V>, data: Vec<u8>) -> Result<ChangeSet>

Where you would actually do the bincode decoding, looking at the V to tell which verison of the struct you should try and decode. To get forwards compatibility, if V couldn't be decoded it will pass in None and you can just try and decode with trailing bytes.

Jun 12 '24 02:06 LLFourn

@LLFourn I agree that handling empty fields only needs to be handled right now in the file store. I don't expect versions will change that often, so rather than making a whole general purpose framework for handling this can we just take the simple enum approach in that crate?

I'm hoping we don't add new fields that often, something like this:

pub enum VersionedCombinedChangeSet<K,A> {
    V1 {
        chain: crate::local_chain::ChangeSet,
        indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
        network: Option<bitcoin::Network>,
    },
    V2 {
        chain: crate::local_chain::ChangeSet,
        indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
        network: Option<bitcoin::Network>,
        new_info: Option<u32>,
    }
}

impl<K,A> From<VersionedCombinedChangeSet<K,A>> for CombinedChangeSet<K,A> {
    fn from(versioned: VersionedCombinedChangeSet<K, A>) -> Self {
        match versioned {
            VersionedCombinedChangeSet::V1 { chain, indexed_tx_graph, network } => Self {
                chain,
                indexed_tx_graph,
                network,
                // missing fields get default
                ..Default::default()
            }
            VersionedCombinedChangeSet::V2 { chain, indexed_tx_graph, network, new_info } => Self {
                chain,
                indexed_tx_graph,
                network,
                new_info,
            }
        }
    }
}

impl<K,A> From<CombinedChangeSet<K,A>> for VersionedCombinedChangeSet<K,A> {
    fn from(changeset: CombinedChangeSet<K,A>) -> Self {
        // current latest version
        VersionedCombinedChangeSet::V2 {
            chain: changeset.chain,
            indexed_tx_graph: changeset.indexed_tx_graph,
            network: changeset.network,
            new_info: changeset.new_info,
        }
    }
}

Jun 17 '24 19:06 notmandatory

To add some context to this conversation, I'll touch on all proposed future changes that will affect ChangeSet structures.

Add another tx timestamp for when wallet created tx

Where? tx_graph::ChangeSet
What? New field - first_seen: BTreeMap<Txid, u64>
Default? BTreeMap::new()
Why? Better tx ordering
Context? #1333

Wallet birthday

Where? bdk_wallet::ChangeSet
What? New field - birthday: Option<u64>
Default? None
Why? For block-by-block sync, can know where to start scanning from for recovery
Context? #1318

Script pubkey birthday

Where? bdk_wallet::keychain::ChangeSet
What? New field - spk_birthdays: BTreeMap<u32, u64>` - where the key is the derivation index and the value is the birthday timestamp. If a derivation index has no birthday, the birthday of an earlier derivation index is used.
Default? BTreeMap::new()
Why? Can be used as policy for light-sync. I.e. Only scan for addresses which have birthdays earlier than 3 days. Good information for users.
Context? No ticket yet

Monotone `LocalChain` changesets

Where? local_chain::ChangeSet
What? Change BTreeMap<u32, Option<BlockHash>> to BTreeMap<BlockId, BTreeSet<BlockId>> where key is the block-id in question, and values are prev-blocks which are in the same chain.
Default? BTreeMap::new()
Why? This creates an API that is harder to break. It also allows us to update the chain from multiple sources without creating conflicts.
Context? #1005

Add block metadata in `LocalChain`

Where?
- Without monotone LocalChain changesets: BTreeMap<u32, Option<(BlockHash, B)>
- With monotone LocalChain changesets: BTreeMap<BlockId, (BTreeSet<BlockId>, B)>
Default? BTreeMap::new()
Why? Can store header data for SPV, filters, utreexo, etc.
Context? No ticket yet

Jun 24 '24 06:06 evanlinjin

Changes which only adds new fields are easily backwards and forwards compatible. Changes which change fields (monotone LocalChain changeset and adding metadata to LocalChain) are more problematic.

For monotone LocalChain changesets, we can easily upgrade older changesets into the newer one. However, it will be problematic if a changeset of an older version is merged onto a changeset of a newer version, therefore this breaks "forward compatibility".

For block metadata, we can have forwards and backwards compatibility if B can be empty and have a default.

Jun 24 '24 06:06 evanlinjin

I think we should have the following:

Allow breaking backwards-compatibility for bdk_chain changesets. Otherwise for all bdk_chain changesets, we either need to keep the old fields (which would be weird), or version all changesets individually (i.e. an enum). This is difficult to maintain.
Only make bdk_wallet::ChangeSet backwards compatible. This can either be achieved via enum versioning, or flattening the bdk_chain changeset fields in bdk_wallet::ChangeSet.
For persistence that uses serialization formats without delimiters, have special logic to handle it.

I'm a fan of flattening the bdk_wallet::ChangeSet. It makes it easier to maintain. For upgrades that require changing fields, we have an upgrade method that empties the legacy fields and tries to populate the newer fields. However, a non-upgraded changeset cannot be applied on an updated changeset. Wallets would never create non-upgraded changesets after an upgrade anyway - therefore, a panic here is fine? Or we can just ignore the legacy fields (chain sources can figure out what is missing anyway).

Jun 24 '24 08:06 evanlinjin

@evanlinjin per recent discord chat the changes that need to be done are:

Move CombinedChangeSet back to bdk_wallet::ChangeSet
~Introduce B generic to LocalChain.~
Remove keychain::ChangeSet::keychains_added and add this to bdk_wallet::ChangeSet.
Introduce TLV for bdk_file_store.

Do we need separate issues and/or PRs for these or want to do them all together? Can I help by doing a PR for 1?

Jul 01 '24 20:07 notmandatory

@notmandatory one more breaking change here: https://github.com/bitcoindevkit/bdk/issues/1333

I would like more feedback on this.

I'm also currently working on 1 & 2 (should have a PR ready tomorrow).

Jul 02 '24 10:07 evanlinjin

fixed by #1514

Jul 22 '24 01:07 notmandatory

bdk bdk copied to clipboard

Backwards compatibility of `bdk::ChangeSet` data structure

Add another tx timestamp for when wallet created tx

Wallet birthday

Script pubkey birthday

Monotone LocalChain changesets

Add block metadata in LocalChain

bdk
bdk copied to clipboard

Monotone `LocalChain` changesets

Add block metadata in `LocalChain`