reth icon indicating copy to clipboard operation
reth copied to clipboard

feat: implement table range checksums for reth db checksum

Open AbnerZheng opened this issue 1 year ago • 6 comments

close https://github.com/paradigmxyz/reth/issues/7561

Check whether it is in the right direction.

cargo run --bin reth db checksum HashedAccounts --datadir ~/.local/share/reth/holesky  --start 0x005e54f1867fd030f90673b8b625ac8f0656e44a88cfc0b3af3e3f3c3d486960 --end 0x03089e01be9eb2af5ff5fa1c5983c6c6fb78dd734658d1f8f11d4f8d27a23fd5

And here is the result:

2024-04-13T17:16:42.693809Z  WARN This command should be run without the node running!
2024-04-13T17:16:42.695030Z  INFO <range>:: 0x005e54f1867fd030f90673b8b625ac8f0656e44a88cfc0b3af3e3f3c3d486960..=0x03089e01be9eb2af5ff5fa1c5983c6c6fb78dd734658d1f8f11d4f8d27a23fd5
2024-04-13T17:16:42.695765Z  INFO Hashed 0 entries.
2024-04-13T17:16:42.695850Z  INFO Hashed 4 entries.
2024-04-13T17:16:42.695895Z  INFO Checksum for table `HashedAccounts`: 0xf0b3ac90be2afa66 (elapsed: 301.167µs)

AbnerZheng avatar Apr 13 '24 17:04 AbnerZheng

hey @AbnerZheng did my comment make sense / are you stuck on anything?

Rjected avatar Apr 24 '24 20:04 Rjected

hey @AbnerZheng did my comment make sense / are you stuck on anything?

It makes sense. But I am not familiar with these table, trying to sync a node on my server so that I could inspect these tables.

AbnerZheng avatar Apr 25 '24 08:04 AbnerZheng

hey @AbnerZheng did my comment make sense / are you stuck on anything?

It makes sense. But I am not familiar with these table, trying to sync a node on my server so that I could inspect these tables.

btw it should be possible to test this with a testnet node, for example holesky, which is much smaller! lmk if you're still blocked or don't understand something - the table definitions are here https://github.com/paradigmxyz/reth/blob/12873d515a9cea30d553fe938dc42a12c072562b/crates/storage/db/src/tables/mod.rs#L246-L380

Rjected avatar Apr 29 '24 17:04 Rjected

@Rjected I have add another argument limit, and print the start-key and end-key when running checksum.

I can imagine the usage of this tool would like:

  1. Get the table name by running reth db stats if don't know the name exactly.
  2. Run checksum with the argument limit being specified, but without setting start-key or end-key. For example:
reth db checksum AccountsHistory --datadir ~/.local/share/reth/holesky --limit 100

The result would be like:

2024-04-30T15:27:36.360619Z  WARN This command should be run without the node running!
2024-04-30T15:27:36.360754Z  INFO Hashed 0 entries.
2024-04-30T15:27:36.360870Z  INFO Hashed 100 entries.
2024-04-30T15:27:36.360908Z  INFO start-key: {"key":"0x0000000000000000000000000000000000000000","highest_block_number":30161}
2024-04-30T15:27:36.360919Z  INFO end-key: {"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}
2024-04-30T15:27:36.360934Z  INFO Checksum for table `AccountsHistory`: 0x98f8199844a34072 (elapsed: 178.629µs)
  1. Compare the start-key, end-key, checksum with other. If they are the same, continue running the command with the end-key we got before as the new start-key, for example:
reth db checksum AccountsHistory --datadir ~/.local/share/reth/holesky --limit 100 --start-key '{"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}'

We can get:

2024-04-30T15:32:01.614445Z  WARN This command should be run without the node running!
2024-04-30T15:32:01.614565Z  INFO start={"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615} 
 end= 
2024-04-30T15:32:01.614625Z  INFO Hashed 0 entries.
2024-04-30T15:32:01.614779Z  INFO Hashed 100 entries.
2024-04-30T15:32:01.614813Z  INFO start-key: {"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}
2024-04-30T15:32:01.614820Z  INFO end-key: {"key":"0x00000000000000000000000000000000000000c1","highest_block_number":18446744073709551615}
2024-04-30T15:32:01.614833Z  INFO Checksum for table `AccountsHistory`: 0xadb85f752caba2fe (elapsed: 207.885µs)
  1. repeat step 2 and 3 and to find the corrupt range. Or we can use binary search strategy if you like.

So instead of documenting the unit, user can know how key would look like by running directly with limit setted and without setting start-key and end-key.

AbnerZheng avatar Apr 30 '24 15:04 AbnerZheng

@Rjected bump

mattsse avatar May 17 '24 07:05 mattsse

@Rjected I have add another argument limit, and print the start-key and end-key when running checksum.

I can imagine the usage of this tool would like:

1. Get the table name by running `reth db stats` if don't know the name exactly.

2. Run checksum with the argument `limit` being specified, but without setting `start-key` or `end-key`.  For example:
reth db checksum AccountsHistory --datadir ~/.local/share/reth/holesky --limit 100

The result would be like:

2024-04-30T15:27:36.360619Z  WARN This command should be run without the node running!
2024-04-30T15:27:36.360754Z  INFO Hashed 0 entries.
2024-04-30T15:27:36.360870Z  INFO Hashed 100 entries.
2024-04-30T15:27:36.360908Z  INFO start-key: {"key":"0x0000000000000000000000000000000000000000","highest_block_number":30161}
2024-04-30T15:27:36.360919Z  INFO end-key: {"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}
2024-04-30T15:27:36.360934Z  INFO Checksum for table `AccountsHistory`: 0x98f8199844a34072 (elapsed: 178.629µs)
3. Compare the `start-key`, `end-key`, `checksum` with other. If they are the same, continue running the command with the `end-key` we got before as the new `start-key`, for example:
reth db checksum AccountsHistory --datadir ~/.local/share/reth/holesky --limit 100 --start-key '{"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}'

We can get:

2024-04-30T15:32:01.614445Z  WARN This command should be run without the node running!
2024-04-30T15:32:01.614565Z  INFO start={"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615} 
 end= 
2024-04-30T15:32:01.614625Z  INFO Hashed 0 entries.
2024-04-30T15:32:01.614779Z  INFO Hashed 100 entries.
2024-04-30T15:32:01.614813Z  INFO start-key: {"key":"0x000000000000000000000000000000000000005e","highest_block_number":18446744073709551615}
2024-04-30T15:32:01.614820Z  INFO end-key: {"key":"0x00000000000000000000000000000000000000c1","highest_block_number":18446744073709551615}
2024-04-30T15:32:01.614833Z  INFO Checksum for table `AccountsHistory`: 0xadb85f752caba2fe (elapsed: 207.885µs)
4. repeat step 2 and 3 and to find the corrupt range. Or we can use binary search strategy if you like.

So instead of documenting the unit, user can know how key would look like by running directly with limit setted and without setting start-key and end-key.

this sounds great! taking a look at the changes

Rjected avatar May 20 '24 18:05 Rjected

bump @Rjected

mattsse avatar May 22 '24 09:05 mattsse