bdk
bdk copied to clipboard
Backwards compatibility of `bdk::ChangeSet` data structure
Describe the enhancement
The bdk_wallet::ChangeSet
is what the persistence interfaces with. As suggested by @tnull, this structure should be backwards compatible (in case we make changes to this structure).
An easy solution is to make bdk_wallet::ChangeSet
a enum:
#[non_exhaustive]
pub enum ChangeSet {
V0(ChangeSet0),
V1(ChangeSet1), // in the future this may exist
}
// We always convert to the latest version
impl From<ChangeSet0> for ChangeSet1 { /* TODO */ }
Note that it would also be nice to consider whether even forwards compatibility could be guaranteed. It may make things a bit more complicated, but it's worth exploring, especially since these properties are transitive for any downstream users (e.g., LDK Node can only guarantee what BDK guarantees...).
An easy solution is to make
bdk::ChangeSet
a enum:
Mh, but then you'd need to define explicit migration functions to allow user migrating from V0
to V1
? Another way would be to a) don't break compat for any existing fields and b) introduce any new fields as Option
s so that users upgrading could deal with 'old' objects in which the fields are still None
.
@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you
@tnull so you want an older version of BDK to understand changesets that a newer version of BDK outputs. Can you elaborate why this would be useful? Thank you
Forwards compatibility is really useful to allow downgrades for various reasons: say a user upgrades to a newer BDK version and then a bug is discovered. If the upgrade breaks serialization forwards compatibility they couldn't downgrade to get going again and would be stuck until BDK ships a fix. Similarly, it would be necessary to be a bit more forgiving when restoring from backups or when having multiple apps that operate on the same data set.
In LDK Node's case we for example plan to allow multi-device support synced via a VSS backend. Not having forwards compatibility of the serialized data would require our users to upgrade all apps and devices in lockstep as otherwise an outdated device would refuse to read any data written by the already upgraded devices.
As said above, I'm aware that it may complicate things, but it's at least worth considering, i.e., it should be a dedicated design choice not to provide these guarantees.
Thanks for bringing up this point. I think we should never change this for structures in bdk_chain
once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:
- #1005
- #1101
I'm not sure if we can apply the same policy to bdk::Wallet
. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.
I think forwards compatibility is possible but this depends on the application serializing each changeset in a particular way. Note in bdk_file_store
we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializing Vec<u8>
of the encoded entry which will prefix it with the length.
Note in
bdk_file_store
we don't serialize changesets with a length so the decoder is expected to understand everything in a changeset. We could improve this by first bincode serializingVec<u8>
of the encoded entry which will prefix it with the length.
Yeah, I think even independently from (forwards) compatibility guarantees, having some TLV-style encoding or at the very least adding the length sounds like a very good idea. Not having this really limits how you can make necessary changes to the data model in the future.
Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.
Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new bdk::ChangeSet
versions.
I added this to the alpha.4 milestone to decide at least if we're going to do this or not since it's a biggish functional change.
Do we really need forward and backwards compatibility baked into the data structures for persisted data? My understanding is that the data BDK persists should always able to be regenerated from the wallet/app descriptors and blockchain client. So in the case of a roll forward/back should be able to just do a fresh full scan.
I mean generally forwards compat. is a "nice-to-have", not a hard requirement. Still, ... nice to have it. ;) And I agree that it doesn't have to be explicit in the data structures (in fact, none of the compat guarantees have to be), but if we want it, it should be clearly defined and documented, in the best case even (regularly) checked.
Optionally some persistence modules like SQL based ones should be able to at least rolling forward new schema changes to work with new
bdk::ChangeSet
versions.
Mh, but this is really about the data model itself, not the persistence layer. Of course the persistence layer could always provide migrations, but the idea compat. guarantees is exactly that the data model does not need an explicit migration step to go forwards/backwards.
Thanks for bringing up this point. I think we should never change this for structures in
bdk_chain
once we have a proper release. Adding new functionality should be adding a new type. I think there are two remaining proposed changeset changes there:1. [LocalChain changes should be monotone #1005](https://github.com/bitcoindevkit/bdk/issues/1005) 2. [Reintroduce descriptor data to `bdk::Wallet` persistence #1101](https://github.com/bitcoindevkit/bdk/issues/1101)
I'm not sure if we can apply the same policy to
bdk::Wallet
. If we do want to make changes there then we should indeed try and do it via optional fields at the end of the changeset.
I just realized this suggestion doesn't make sense since we're using bincode. You indeed have to do what @evanlinjin suggested in the OP for backwards compat.
Lib team call: It changes the API so we should include it before cutting off for beta release
I suggest we go ahead with this and do what is described in the ticket description.
We are now using the CombindedChangeSet
type for persisting Wallet
data and that type implements Default
, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added to CombindedChangeSet
also implement Default
.
So for example, if we add some field like wallet_name: Option<String>
to a future version of CombinedChangeSet
, then when we read an old data file with the old struct we'll get None
and have to ensure the rest of the code will handle it in some reasonable way without a panic.
@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for CombinedChangeSet
.
We are now using the
CombindedChangeSet
type for persistingWallet
data and that type implementsDefault
, so all fields are optional or can be empty. I think this lets us maintain backward compatibility as long as we require that going forward any new fields added toCombindedChangeSet
also implementDefault
.So for example, if we add some field like
wallet_name: Option<String>
to a future version ofCombinedChangeSet
, then when we read an old data file with the old struct we'll getNone
and have to ensure the rest of the code will handle it in some reasonable way without a panic.@tnull does this work for you? I believe it's what you suggested in our last call. If so I suggest we only need to document this restriction in the docs for
CombinedChangeSet
.
Yes, it sounds like this could work, especially if newly added fields are Option
s and this remains the only type that needs to be persisted. Note that it would still be unclear how you'd handle any types for which a sane default isn't easily available.
Might also be worth to add a test for the serialization upgrade whenever changes are made to CombinedChangeSet
or any of the dependent types.
@notmandatory some serialization formats would not work with what's suggested (i.e. bincode) when the entire changeset is persisted. I think we should create a new type (i.e. VersionedChangeSet
) which is an enum of different versions of CombinedChangeSet
. The changeset will auto-upgrade itself to the latest version (if that makes sense).
We still need to ensure new fields added have sane defaults (as @tnull pointed out). However, I can't imagine a scenario where something added wouldn't have a sane default.
P.s. I'm working on this right now!
Can you elaborate on why serialization schemes like bincode wouldn't work with the idea of making all fields, and new ones in particular, optional (or potentially empty)? doesn't any serialization we use have to have the property that serdes change sets from/to regular rust structs and that can be appended as we do now?
In theory bincode could work like this if you added a length prefix to each changeset but in practice there doesn't seem to be APIs to "decode this thing and leave the remaining fields as Default
if you run out of data" which seems really unfortunate. This stops you doing backwards compatibility without version bytes. Version bytes break forwards compatibility.
I think it would be possible to get both if we were willing to do a custom serde derserializer that did versioning but if the version is higher than the current impl, then just try deserializing anyway but ignore trailing bytes (note that bincode can ignore trailing bytes). This would be a little better since it doesn't require making the enum.
I wonder if there's a way we could do this in bdk file persist where we just do the version prefixing there and you provide a callback to the bdk_file_persist telling it how to decode each older version. This seems like the cleanest because it puts the solution where the actual problem is (bincode) and doesn't preclude forwards compatibility for other formats (e.g. cbor, json etc). This would turn it into a TLV kind of thing where each entry is in the form:
struct Entry<V> {
version: V,
data: Vec<u8>,
}
Then you'd provide a callback like:
Fn(Option<V>, data: Vec<u8>) -> Result<ChangeSet>
Where you would actually do the bincode decoding, looking at the V
to tell which verison of the struct you should try and decode. To get forwards compatibility, if V couldn't be decoded it will pass in None
and you can just try and decode with trailing bytes.
@LLFourn I agree that handling empty fields only needs to be handled right now in the file store. I don't expect versions will change that often, so rather than making a whole general purpose framework for handling this can we just take the simple enum approach in that crate?
I'm hoping we don't add new fields that often, something like this:
pub enum VersionedCombinedChangeSet<K,A> {
V1 {
chain: crate::local_chain::ChangeSet,
indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
network: Option<bitcoin::Network>,
},
V2 {
chain: crate::local_chain::ChangeSet,
indexed_tx_graph: crate::indexed_tx_graph::ChangeSet<A, crate::keychain::ChangeSet<K>>,
network: Option<bitcoin::Network>,
new_info: Option<u32>,
}
}
impl<K,A> From<VersionedCombinedChangeSet<K,A>> for CombinedChangeSet<K,A> {
fn from(versioned: VersionedCombinedChangeSet<K, A>) -> Self {
match versioned {
VersionedCombinedChangeSet::V1 { chain, indexed_tx_graph, network } => Self {
chain,
indexed_tx_graph,
network,
// missing fields get default
..Default::default()
}
VersionedCombinedChangeSet::V2 { chain, indexed_tx_graph, network, new_info } => Self {
chain,
indexed_tx_graph,
network,
new_info,
}
}
}
}
impl<K,A> From<CombinedChangeSet<K,A>> for VersionedCombinedChangeSet<K,A> {
fn from(changeset: CombinedChangeSet<K,A>) -> Self {
// current latest version
VersionedCombinedChangeSet::V2 {
chain: changeset.chain,
indexed_tx_graph: changeset.indexed_tx_graph,
network: changeset.network,
new_info: changeset.new_info,
}
}
}
To add some context to this conversation, I'll touch on all proposed future changes that will affect ChangeSet
structures.
Add another tx timestamp for when wallet created tx
-
Where?
tx_graph::ChangeSet
-
What? New field -
first_seen: BTreeMap<Txid, u64>
-
Default?
BTreeMap::new()
- Why? Better tx ordering
- Context? #1333
Wallet birthday
-
Where?
bdk_wallet::ChangeSet
-
What? New field -
birthday: Option<u64>
-
Default?
None
- Why? For block-by-block sync, can know where to start scanning from for recovery
- Context? #1318
Script pubkey birthday
-
Where?
bdk_wallet::keychain::ChangeSet
-
What? New field -
spk_birthdays:
BTreeMap<u32, u64>` - where the key is the derivation index and the value is the birthday timestamp. If a derivation index has no birthday, the birthday of an earlier derivation index is used. -
Default?
BTreeMap::new()
- Why? Can be used as policy for light-sync. I.e. Only scan for addresses which have birthdays earlier than 3 days. Good information for users.
- Context? No ticket yet
Monotone LocalChain
changesets
-
Where?
local_chain::ChangeSet
-
What? Change
BTreeMap<u32, Option<BlockHash>>
toBTreeMap<BlockId, BTreeSet<BlockId>>
where key is the block-id in question, and values are prev-blocks which are in the same chain. -
Default?
BTreeMap::new()
- Why? This creates an API that is harder to break. It also allows us to update the chain from multiple sources without creating conflicts.
- Context? #1005
Add block metadata in LocalChain
-
Where?
- Without monotone
LocalChain
changesets:BTreeMap<u32, Option<(BlockHash, B)>
- With monotone
LocalChain
changesets:BTreeMap<BlockId, (BTreeSet<BlockId>, B)>
- Without monotone
-
Default?
BTreeMap::new()
- Why? Can store header data for SPV, filters, utreexo, etc.
- Context? No ticket yet
Changes which only adds new fields are easily backwards and forwards compatible. Changes which change fields (monotone LocalChain
changeset and adding metadata to LocalChain
) are more problematic.
For monotone LocalChain
changesets, we can easily upgrade older changesets into the newer one. However, it will be problematic if a changeset of an older version is merged onto a changeset of a newer version, therefore this breaks "forward compatibility".
For block metadata, we can have forwards and backwards compatibility if B
can be empty and have a default.
I think we should have the following:
- Allow breaking backwards-compatibility for
bdk_chain
changesets. Otherwise for allbdk_chain
changesets, we either need to keep the old fields (which would be weird), or version all changesets individually (i.e. an enum). This is difficult to maintain. - Only make
bdk_wallet::ChangeSet
backwards compatible. This can either be achieved via enum versioning, or flattening thebdk_chain
changeset fields inbdk_wallet::ChangeSet
. - For persistence that uses serialization formats without delimiters, have special logic to handle it.
I'm a fan of flattening the bdk_wallet::ChangeSet
. It makes it easier to maintain. For upgrades that require changing fields, we have an upgrade
method that empties the legacy fields and tries to populate the newer fields. However, a non-upgraded changeset cannot be applied on an updated changeset. Wallets would never create non-upgraded changesets after an upgrade anyway - therefore, a panic here is fine? Or we can just ignore the legacy fields (chain sources can figure out what is missing anyway).
@evanlinjin per recent discord chat the changes that need to be done are:
- Move CombinedChangeSet back to bdk_wallet::ChangeSet
- ~Introduce B generic to LocalChain.~
- Remove keychain::ChangeSet::keychains_added and add this to bdk_wallet::ChangeSet.
- Introduce TLV for bdk_file_store.
Do we need separate issues and/or PRs for these or want to do them all together? Can I help by doing a PR for 1?
@notmandatory one more breaking change here: https://github.com/bitcoindevkit/bdk/issues/1333
I would like more feedback on this.
I'm also currently working on 1 & 2 (should have a PR ready tomorrow).
fixed by #1514