bdk icon indicating copy to clipboard operation
bdk copied to clipboard

Introduce TLV for bdk_file_store

Open notmandatory opened this issue 1 year ago • 11 comments

Sub-task for #1103

notmandatory avatar Jul 02 '24 00:07 notmandatory

Okay technically we don't need the T since we always append the same type. I propose the following:

Encoding

Encode each element as:

[Version: VarInt] [Length: VarInt] [Data: Bytes]

Where VarInt is defined here: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer.

VersionedElement trait

pub trait VersionedElement: Sized {
    /// Decode the `data` of given `version`.
    fn decode_with_version(version: u64, data: &[u8]) -> Result<Self, VersionedDecodeError>;

    /// The version to encode the data with.
    fn encode_version() -> u64;
}

pub enum VersionedDecodeError {
    UnknownVersion(u64),
    Bincode(bincode::Error),
}

evanlinjin avatar Jul 02 '24 06:07 evanlinjin

I did a lot of thinking about this.

Backwards compat

/// Persists an append-only list of changesets (`C`) to a single file.
#[derive(Debug)]
pub struct Store<C, V = C> {
    magic_len: usize,
    db_file: File,
    marker: PhantomData<(V, C)>,
}

impl<V, C> Store<V, C>
where
    C: Into<V> + Append,
    V: Into<C> + serde::Serialize + serde::de::DeserializeOwned,

As you can see there is a new type parameter V -- this is the type that is serialized to and from the actual disk.

So in practice you'd have:

pub struct MyChangeSetV0_17_0 {... }
pub struct MyChangeSet { ... }

// serde serde etc
pub enum VersionedMyChangeSet {
    V0_17_0(MyChangeSetV0_17_0)
    Current(MyChangeSet)
}

impl From<MyChangeSet> for VersionedMyChangeSet {
     fn from(value: VersionedMyChangeSet) -> MyChangeSet {
         // do the obvious
     }
}

Then you'd set V as VersionedMyChangeSet and C as MyChangeSet.

If we do have forwards compatibility we need a length prefix. This can be done by just encoding each entry into a Vec<u8> and bincode encoding the result to the file.

This makes backwards compat optional and straight forward to achieve. You just have to nudge users to create this enum. For wallet we can create VersionedChangeSet and do all this without depending on bdk_file_persist I'm sure it's likely to be useful in the future anyway.

When making new changesets for existing types (which for the record I actually think we can avoid) we keep the old one around of course and put ChangeSetV0_WHAT_EVER at the end. bdk_wallet will probably use this in turn to develop a new changeset when it goes to v2.

Forwards Compat

To do forwards compat we just write the length of each entry before we serialize it. This can be done in a couple of ways. e.g. https://docs.rs/bincode/latest/bincode/config/trait.Options.html#method.serialized_size or by just serialzing to a Vec<u8> and serializing that.

Now that we have the length, when we decode an entry to a Vec<u8>, we try and decode as V. If it fails we strip the enum varint (tell bincode to just read an int), and then just try to decode as C, if that fails we finally throw an error.

So this gives you forwards compat as long as whatever the future V is that has been written encodes first a varint and then something that C can partially decode. This would be the case if the new version has just added an extra optional field.

For what it's worth, I discussed with Evan about things we had as breaking changes coming up or speculated and none of them actually were compatible with forwards compat (none of them would be just adding an optional field).

LLFourn avatar Jul 03 '24 08:07 LLFourn

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

evanlinjin avatar Jul 03 '24 12:07 evanlinjin

I later realized the "Forwards Compat" bit is not totally sound. Before trying to do a decode on a higher versioned entry you need to actually check it is higher versioned not just assume so because it failed. It could be a corrupted lower version entry or something else has gone wrong. This means we'd need to know the current version.

It's actually kinda possible to do this automatically with bincode if you require Default for V and just encode the Default and see what int it encodes as to get the implicit version number. Since we already enforce Default for Append, then V should do it too. A little hacky but it does give us the feature. I'd be tempted to not actually implement this just yet but leave forwards compat later. I fully realize that "let's do forwards compat later" sounds like the famous last words.

LLFourn avatar Jul 04 '24 00:07 LLFourn

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

I would like to give it a try.

nymius avatar Jul 04 '24 14:07 nymius

@nymius I've assigned this to you, thanks for jumping in to take this on.

notmandatory avatar Jul 04 '24 21:07 notmandatory

Can we push this TLV work for the bdk_file_store out of the 1.0.0-alpha milestone?

This would let the team focus on #1496 and #1498 changes and we'd only deliver backward compatibility for the beta with bdk_sqlite. The examples would also have to be changed to use bdk_sqlite but that should be faster/easier to review than redesigning bdk_file_store.

notmandatory avatar Jul 05 '24 14:07 notmandatory

I discussed this with @evanlinjin and decided this issue can be moved to the 1.0.0-beta milestone. We just need to warn beta users that if they want backward data file compatibility they'll need to use the bdk_sql until this issue is done.

notmandatory avatar Jul 08 '24 15:07 notmandatory

I'd like to know what @nymius and @evanlinjin actually think about whether it's a good idea to actually do this. Before I speculated about how to this but didn't really consider too deeply whether or not it's actually a good idea. We could just leave bdk_file_store as a low dependency ultra-minimal example of how to store and retrieve changesets. It's helpful in development before you commit to maintaining a specific backend for your users. Once we start making it forwards/backwards compatible we are implying that it can be used in production. I have no real fear about the design being used in production. I am concerned that bringing this to the level of quality needed to be confidently used in production might be wasted since I haven't heard of people wanting to use this? All the extra code also detracts from the its high-level pedagogical value since it's mostly about dealing with the particulars of bincode.

LLFourn avatar Sep 10 '24 01:09 LLFourn

I didn't considered the legitimacy of this as a real production case either. It seemed a good case just for the comments on #1103. Besides its utility I think I'm closer in the implementation to what you and evan underpinned in the comments, I only lacked the time to finish it last week. From a pedagogical point of view, the code certainly gets more complex and probably hides what you are trying to "teach". Looking at its production application, what are the alternatives? sqlite? sqlx? what benefits they bring in relation to bincode? Thinking about redundancy for example sqlite and file-store would have the same method: duplicate the file and copy it in another location. One drawback I can think of is that Bincode is harder to recover if the file brakes, because we cannot be sure were the error occurred just from the file. Something bincode doesn't have that SQLite does, is the ability to record snapshots, without having to implement any extra logic, for example. Something PostgreSQL is capable too. Considering the number of operations database engines can perform that file-store would have to implement itself, I would consider it production unsuitable, as these operations usually improve the reliability of the application, and for file-store would have to be implemented from scratch with all the costs this involves.

nymius avatar Sep 11 '24 01:09 nymius

My 2 sats on this is that file store is 1. a good as a simple example, 2. only really needs to show backward compatibility (not forward), and 3. isn't needed for the 1.0 final release so should be moved out of the 1.0.0-beta milestone.

I do think most production apps will use sqlite for single user apps (ie mobile), or sqlx for larger multi-user apps (on servers). In my experience SQL stores are reliable and most devs know how to work with them. Unless someone has a very limited resources device that needs to tightly control dependencies I don't think a flat file based persister will be a very popular wallet data store option. But still as an example and inspiration for anyone who does need a custom data store it's good to have.

notmandatory avatar Sep 18 '24 21:09 notmandatory

Let's close this for now then. Please see https://github.com/bitcoindevkit/bdk/pull/1661 which documents the state of bdk_file_store.

LLFourn avatar Oct 27 '24 23:10 LLFourn