bdk Appending changesets to `Store` after opening causes overwriting

Describe the bug

let's say a Store instance initially contains changeset1 stored in its file. https://github.com/bitcoindevkit/bdk/blob/d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1/crates/file_store/src/store.rs#L14
Then after opening the store instance using Store::open and appending a new changeset (changeset2) using append_changeset, the store will only contain changeset2, not changeset1 + changeset2.

https://github.com/bitcoindevkit/bdk/blob/d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1/crates/file_store/src/store.rs#L73 https://github.com/bitcoindevkit/bdk/blob/d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1/crates/file_store/src/store.rs#L162

To Reproduce

  #[test]
   fn test() {
       let temp_dir = tempfile::tempdir().unwrap();
       let file_path = temp_dir.path().join("db_file");
       let changeset1 = BTreeSet::from(["hello".to_string(), "world".to_string()]);
       let changeset2 = BTreeSet::from(["got the error".to_string()]);

       {
           // create new  db
           let mut db = Store::<TestChangeSet>::create_new(&TEST_MAGIC_BYTES, &file_path)
               .expect("must create");

           // append first changeset to db
           db.append_changeset(&changeset1).expect("must succeed");
       }

       {
           // open db
           let mut db = Store::<TestChangeSet>::open_or_create_new(&TEST_MAGIC_BYTES, &file_path)
               .expect("failed to load db");

           // now append the second changeset
           db.append_changeset(&changeset2).expect("must succeed");

           // Retrieve stored changesets from the database
           let stored_changesets = db
               .aggregate_changesets()
               .expect("must succeed")
               .expect("must succeed");

           // expected changeset must  be changeset2 + changeset1
           let mut expected_changeset = changeset2.clone();
           expected_changeset.extend(changeset1);

           // Assert that stored_changesets matches changeset2 but not expected_changeset
           assert_eq!(stored_changesets, changeset2);
           assert_ne!(stored_changesets, expected_changeset);
       }

       // Reason:
       //
       // On loading the pre-existing store instance,
       // it contains all the pre-appended changesets but the file pointer is positioned back to 12
       // i.e till the size of TEST_MAGIC_BYTES array.
       // thus on appending any new changeset to db -> it overwrites those past changesets
       // thus only contain the latest changeset.

       // Open the store again to verify file pointer position at the end of the file
       let mut db =
           Store::<TestChangeSet>::open(&TEST_MAGIC_BYTES, &file_path).expect("failed to load db");

       // get the current position of file pointer just after loading store
       let current_pointer = db.db_file.stream_position().expect("must suceed");

       assert_eq!(current_pointer, 12);

       // end pointer for the loaded db
       let expected_pointer = db.db_file.seek(SeekFrom::End(0)).expect("must succeed");

       // both doesn't match
       assert_ne!(current_pointer, expected_pointer);
   }

Expected behavior

On opening the store -> its file pointer must be set to end.

Build environment

BDK tag/commit: d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1
OS+version: Fedora 37
Rust/Cargo version: rustc 1.81.0-nightly (f21554f7f 2024-06-08)
Rust/Cargo target: nightly-2024-06-09-x86_64-unknown-linux-gnu

Jul 18 '24 11:07 KnowWhoami

@evanlinjin could this happen with the changes in #1514?

Jul 18 '24 12:07 storopoli

Thanks for writing up this ticket.

The intended usecase of bdk_file_store is with BDK structures. With BDK, we never read after writing.

Maybe we can structure things to be more fool-proof. Need to think more on this.

Jul 18 '24 15:07 evanlinjin

Thanks @KnowWhoami. Maybe Store::open should always seek the file cursor to EOF? If this is truly a bug then I'm surprised all the tests and examples work how we expect using the file store

Jul 18 '24 15:07 ValuedMammal

If this is truly a bug then I'm surprised all the tests and examples work how we expect using the file store

while that's because in tests which considers opening the pre-existing Store -> we often run aggregate_changesets api that internally calls the iter_changesets api's which iterates over the stored changesets in order to collect all the stored changesets but it also changes the file pointer of Store and thus moves it to the end of file.

https://github.com/bitcoindevkit/bdk/blob/d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1/crates/file_store/src/store.rs#L135

https://github.com/bitcoindevkit/bdk/blob/d99b3ef4b449a5c7b8facf04db3cb9be829a0ac1/crates/file_store/src/store.rs#L238

Thus , it works fine if we append any new changesets after then.

Jul 20 '24 05:07 KnowWhoami

The intended usecase of bdk_file_store is with BDK structures. With BDK, we never read after writing. Maybe we can structure things to be more fool-proof. Need to think more on this.

Yeah that's true, BDK wallet Api's don't care much about internal working of database , they just need to read the db to get required changesets and append's new one if found any. That's it.
But since bdk_file_store is mainly developed for the downstream users so that they can directly used it in their BDK based wallets. So this bug should be taken into consideration.

Jul 20 '24 15:07 KnowWhoami

I think it would help if we combine opening and loading into one read method that opens the file and returns the aggregate changeset, thereby ensuring the next write goes at the end of the file

Sep 21 '24 16:09 ValuedMammal

I would like to work on this, but I've got some questions:

There cannot be a situation where a user doesn't want to aggregate changesets?
For any reason it's not desirable to store multiple changesets without aggregating them, like in the following succession?

1st changeset: open-append-close
2nd changeset: open-append-close
3rd changeset: open-append-close

I'm thinking in implementing something like the read method @ValuedMammal mentions above, but just using iter_changeset to verify the correctness of the stored changesets, without modifying them. That would place the file pointer at the real end of the file, and check any existing changeset for write errors, leaving to the user the decision of aggregating changesets, or keep appending them until further decision.

Sep 22 '24 21:09 nymius

Perhaps an easy fix is to change the file mode to append here instead of only write

https://github.com/bitcoindevkit/bdk/blob/2cf46a2a9e513d3f2612b7c363446fb1acf958ae/crates/file_store/src/store.rs#L73-L77

After quick testing this appears to fix the issue in the original post, however it caused other test failures, in particular last_write_is_short and write_after_short_read.

Sep 22 '24 22:09 ValuedMammal

In particular, write_after_short_read is clearly proving what @evanlinjin stated above: never read after write . It asserts explicitly that the remaining changesets after the "short read" are overwritten.

Sep 23 '24 00:09 nymius

This is the idea. I have tested it against current file store tests and a modification of the test above to check the expected behavior and it works.

Pros:

Doesn't modify current API

Cons:

Extra method
Extra enum to wrap IterErrors inside FileError

diff --git a/crates/file_store/src/lib.rs b/crates/file_store/src/lib.rs
index 7c943ca2..236995e9 100644
--- a/crates/file_store/src/lib.rs
+++ b/crates/file_store/src/lib.rs
@@ -18,6 +18,8 @@ pub enum FileError {
     Io(io::Error),
     /// Magic bytes do not match what is expected.
     InvalidMagicBytes { got: Vec<u8>, expected: Vec<u8> },
+    /// Something went wrong while decoding file content
+    WrongEncoding(entry_iter::IterError)
 }
 
 impl core::fmt::Display for FileError {
@@ -29,6 +31,7 @@ impl core::fmt::Display for FileError {
                 "file has invalid magic bytes: expected={:?} got={:?}",
                 expected, got,
             ),
+            Self::WrongEncoding(e) => e.fmt(f)
         }
     }
 }
diff --git a/crates/file_store/src/store.rs b/crates/file_store/src/store.rs
index 62c3d91b..fdd7523c 100644
--- a/crates/file_store/src/store.rs
+++ b/crates/file_store/src/store.rs
@@ -92,6 +92,31 @@ where
         })
     }
 
+    /// Open [`Store`] and recover state after last write.
+    ///
+    /// Use [`open`] to open `Store` and manually handle the previous state if any.
+    /// Use [`create_new`] to create a new `Store`.
+    ///
+    /// # Errors
+    ///
+    /// If the prefixed bytes of the opened file does not match the provided `magic`, the
+    /// [`FileError::InvalidMagicBytes`] error variant will be returned.
+    ///
+    /// [`create_new`]: Store::create_new
+    pub fn rewind<P>(magic: &[u8], file_path: P) -> Result<Self, FileError>
+    where
+        P: AsRef<Path>,
+    {
+        let mut store = Self::open(magic, file_path)?;
+        for next_changeset in store.iter_changesets() {
+            if let Err(iter_error) = next_changeset {
+                return Err(FileError::WrongEncoding(iter_error));
+            }
+        }
+
+        Ok(store)
+    }
+
     /// Attempt to open existing [`Store`] file; create it if the file is non-existent.
     ///
     /// Internally, this calls either [`open`] or [`create_new`].

Sep 23 '24 00:09 nymius

@nymius I do think it's reasonable to expect that the "open-append-close" flow should work without overwriting as you mentioned in https://github.com/bitcoindevkit/bdk/issues/1517#issuecomment-2366980177.

With BDK, we never read after writing.

^ To be honest I'm a little confused what is meant by this. edit: I may have been lacking context in which this is actually true; it makes sense over the lifetime of one Store instance.

If we assume the tests are correct, i.e. that append_changeset should write directly after the current file position (even if that means overwriting existing data), then I think the read approach makes sense (similar to your rewind idea). The issue I have with just adding this method (rewind) though is that it doesn't fix the problem with Store::open as pointed out by @KnowWhoami, and in the event the caller does want to load the contents of the database, the result is that we would effectively be doing iter_changesets twice?

On the other hand if we change the file mode to append, it would ensure that append_changeset always goes at the EOF regardless of the file position (if I understand), but we'd then have to redefine the test expectations with regard to truncating, etc

Sep 23 '24 20:09 ValuedMammal

@nymius Revisiting the open-append-close idea:

Isn't it the case that we will always want to load the backend upon opening the file? For example in the context of a bdk wallet we have to load everything into memory in order to initialize the wallet. Without this you would have nothing to "append".

Sep 26 '24 00:09 ValuedMammal

Isn't it the case that we will always want to load the backend upon opening the file? For example in the context of a bdk wallet we have to load everything into memory in order to initialize the wallet. Without this you would have nothing to "append".

I agree. Unless there was a precondition forcing a re scan on each open (which wasn't documented), I think is the most logic thing to do.

Sep 26 '24 01:09 nymius

I did a summary of possible solutions I thought about:

Another method like rewind or read implementing the safe behavior of reading content before proceeding with any writes.

I think it works "on paper" but doesn't solve the underlying problem with open. It's just a workaround for those expecting the functionality explained in the issue.
Iterating changesets on Store opening.

This also works partially, as any encoding error causes the failure of the method and doesn't recover the already parsed changesets as the AggregatedChangesetError allows.
Changing the opening mode of the file to append:

Causes store::test::append_changeset_truncates_invalid_bytes to fail because it won't check the sanity of the previous data before allowing append_changeset to write after it, causing any failing file store to keep breaking store unnoticingly.
Iterating changeset on Store::open call but silently truncating the file to the last successfully read changeset if any error occurs:

Works, only failing test is store::test::write_after_short_read but the problem is on the tests itself, as it expects the file pointer to be reset each time the file is open. I don't like the UX of this. The user will be losing information without noticing.
Iterating changeset on Store::open call but silently truncating the file to the last successfully read changeset if any error occurs, but returning the index of the changeset causing the failure.

Same as above, but improving the UX, now the user knows there is a problem with its changeset store and can decide to do a rescan. Also, we could implement a complementary method to merge changesets until some changeset index, allowing users to work in the meantime with the not corrupted data. Not sure about how useful is this.
Aggregating changeset on each Store open by default:

Should works the same as above but returning a partially parsed changeset. It would also aggregated any changeset in the file, which may not be desired by some users, or at least we should check is not a current use case for someone. The difficulty with this change is it requires the addition of a new variant to the FileError enum, and as this variant is wrapping the AggregateChangesetError, it would require a generic for the partially aggregated changeset, creating a domino effect of complexity, making it a difficult change.
Aggregating changeset on each Store open by default but just returning the error of the AggregatedChangesetsError and ignoring the partially aggregated changeset:

This is an evolution of the solution above, avoiding the complexity of the generic implied by the partial aggregation.

I've tried all of the above. The majority of them make the following test fail, except for 1 and 4:

store::test::append_changeset_truncates_invalid_bytes: a premature iter_changeset in the open method realizesthe code is wrongly formatted but doesn't return the partially read changesets, as it's expected.
store::test::last_write_is_short: a premature iter_changeset will fail if the file has been truncated in the middle of a changeset. The expected behavior is to open it normally but fail to aggregate and return the partial aggregated changeset inside the error.
store::test::write_after_short_read: fails because the expected behavior is to not update the file pointer on open, and just overwrite whatever were there before. Instead, the open reads the changesets stored, updating the file pointer and then proceeds to append any new changeset, effectively avoiding the lose of the previous changesets.

The solutions I prefer is 5 and the second one is 7.

Why 5 over 7? Because 5 gives additional information about the failing changeset, allows a partial recovery and also informs of the error, while 7 just informs the error without giving the previous alternatives.

Sep 26 '24 01:09 nymius

It should be relatively painless if we keep all the same semantics of Store and build a safer API around it, so I want to go back to your first suggestion and add a method (call it reopen) that internally does iter_changesets to check the integrity of the db before returning a new Store instance. We can deprecate the current open and in the future we'll still use it internally, it just won't be marked pub. These are suggestions you can feel free to adapt.

/// Reopen an existing [`Store`]. Errors if...
pub fn reopen<P>(magic: &[u8], file_path: P) -> Result<Self, OpenError> where P: AsRef<Path> { ... }

We can have a new error OpenError that wraps the IterError. If we want to add a new and improved open_or_new for convenience, then I guess OpenError will have to be an enum with a variant for FileError as well.

/// An error while opening or creating the file store
#[derive(Debug)]
pub enum OpenError {
    /// entry iter error
    EntryIter {
        /// index that caused the error
        index: usize,
        /// iter error
        iter: IterError,
    },
    /// file error
    File(FileError),
}

We should be able to keep the existing tests and add coverage for the new functionality fixing the issue. If there's no interest in going the #[deprecated] route, then the suggestion is to make the current open a private method now and replace it with what I'm calling reopen above.

Sep 27 '24 22:09 ValuedMammal