redb icon indicating copy to clipboard operation
redb copied to clipboard

Checksum assert

Open casey opened this issue 2 years ago • 1 comments

I'm currently using the master branch of redb, to avoid #318, and I was hitting this assert whenever I restarted ord:

assert!(Self::verify_primary_checksums(&mem));

Switching to WriteStrategy::Throughput seems to avoid the issue.

casey avatar Aug 16 '22 01:08 casey

Well that's not good! How can I reproduce the crash? I have a test for that code path, but it's not super thorough: https://github.com/cberner/redb/blob/475c2139a98d6f90b919e4f0b99c91d06c64454e/src/tree_store/page_store/page_manager.rs#L1648

cberner avatar Aug 25 '22 04:08 cberner

Sorry for the slow response! I was out of town for the weekend, and just now catching up on my notifications. We switched the main instance to WriteStrategy::Throughput, so I'm trying to reproduce locally. It looks like the integration tests are passing after switching back to WriteStrategy::Latency, so it might only occur when the database gets big or weird somehow.

casey avatar Sep 01 '22 01:09 casey

I was able to reproduce this locally, and opened a draft PR which switches back to WriteStrategy::CommitLatency for debugging.

You can reproduce it by running bitcoind -signet, along with ord --chain signet server, or cargo run -- --chain signet server. Your bitcoind node will start syncing blocks, and the ord instance should hit the assert pretty quickly

casey avatar Sep 01 '22 01:09 casey

Oh, of course. I found the problem. I even thought of this while implementing 1PC+C, but then forgot to address it =P .insert_reserve() corrupts the database when used with checksums because it returns a mutable ref and lets the user modify the value after the checksum is calculated. I use insert_reserve() internally when filling the table of freed pages

cberner avatar Sep 02 '22 04:09 cberner

Ahhh, nice. Glad it's simple. I kind of assumed that because the checksum was complicated, it would be something weird.

casey avatar Sep 03 '22 02:09 casey

I am still seeing this issue on 0.8.0, checksums fail on opening db when my program is suddenly interrupted (but not always, I guess it depends on the commit stage).

bruwozniak avatar Oct 19 '22 18:10 bruwozniak

Hmm, that's not good. What backtrace do you get?

cberner avatar Oct 22 '22 14:10 cberner