go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

all: integrate snapshot into pathdb

Open fynnss opened this issue 9 months ago • 8 comments

This PR implements a way to fetch account/storage directly from pathdb as an alternative to snapshot for better robustness and simplicity, then reducing data storage space.

Major changes:

  • add latestaccount/lateststorage/destructset cache to pathdb
  • use db.seekLT to query the leaf node of account/storage in pathdb
  • use deleteRange to destruct the storage trie

Status: Testing

fynnss avatar May 11 '24 06:05 fynnss

Related issue. https://github.com/ethereum/go-ethereum/issues/28617

fynnss avatar May 11 '24 07:05 fynnss

I'm not entirely sure how this will help us. If I understand it correctly, you used range queries on the database to look up path-based leaves "directly". I'm unsude how fast that is (it might actually be reasonable for 1 value lookups).

That said, the primary purpose of snapshots is to serve snap sync, which needs a way to very quickly and cheaply iterate a range of accounts (e.g. 10K accounts), without having to go through the trie. This work doesn't seem to address that at all, so at teh end of the day, it seems we're still needing the snapshots just as before?

karalabe avatar May 13 '24 09:05 karalabe

which needs a way to very quickly and cheaply iterate a range of accounts (e.g. 10K accounts), without having to go through the trie.

One thing that would be interesting to see, is

  • On a machine which is synced, and has a path-database and also a generated snapshot,
  • Pick a random key-segment,
    • Read 10K accounts from the snapshot
    • Read 10K accounts from the path-database
    • Compare times
  • Repeat a few more times.

holiman avatar May 13 '24 10:05 holiman

I'm not entirely sure how this will help us. If I understand it correctly, you used range queries on the database to look up path-based leaves "directly". I'm unsude how fast that is (it might actually be reasonable for 1 value lookups).

That said, the primary purpose of snapshots is to serve snap sync, which needs a way to very quickly and cheaply iterate a range of accounts (e.g. 10K accounts), without having to go through the trie. This work doesn't seem to address that at all, so at teh end of the day, it seems we're still needing the snapshots just as before?

Thanks for your reply.

  1. db.seek will be about 10% slower than get because get can use bloom filters to accelerate
  2. For snap sync, we can also iterate out all leaf nodes from pathdb. But It may slower than from snapshot. We can test it as @holiman comment above.

The reasons for the PR:

  1. Better robustness and simplicity(as @rjl493456442 mentioned in #28617), no need to maintain two sets of similar structures, snapshot & pathdb (difflayer/disklayer/journal/recovery)
  2. Reduce disk storage and db overhead(compaction). snapshop's data is a subset of pathdb.
  3. Improve caching efficiency. The trie leaf node will be cached when execution.

In the end, this PR may not solve all the problems and hope your suggestions.

fynnss avatar May 14 '24 02:05 fynnss

This seems to be based on seek to a key. The key used is the full 32-byte key, e.g. 0xaabbccdd....xx. But, the actual leaf stored on disk is shorter than the full path, since the last extension node contains the last part of the path. So let's say it's 0xaabbcc. Now, if you seek to 0xaabbccddeeff, you wil step past 0xaabbcc. And wind up on the next leaf. So I don't get it, how can this work, as is?

As I understand it, this can work if you implement a special new seek-method which positions itself on step back, or somehow make the iterator step back on step.

Please explain if I've misunderstood something

holiman avatar May 14 '24 12:05 holiman

Now, if you seek to 0xaabbccddeeff, you wil step past 0xaabbcc. And wind up on the next leaf. So I don't get it, how can this work, as is?

The db.seekLT will return the last key that less than the given key. So the 0xaabbcc will be returned in your case.

Please explain if I've misunderstood something

And there is a special case which the embedded leaf node in fullnode. The PR handles this special case by redundantly storing a copy of the embedded short node in the database.

https://github.com/ethereum/go-ethereum/pull/29754/files#diff-9fcc04b5701b122917f702ceb9f56bb5c190bcfd1f6728a319e171e819952147R116-R123

fynnss avatar May 15 '24 01:05 fynnss

According to the leveldb description,

	// Seek moves the iterator to the first key/value pair whose key is greater
	// than or equal to the given key.
	// It returns whether such pair exist.
	//
	// It is safe to modify the contents of the argument after Seek returns.
	Seek(key []byte) bool

Iterator.Seek will return the first key/value pair whose key is greater than or equal to the given key.

rjl493456442 avatar May 15 '24 05:05 rjl493456442

According to the leveldb description,

	// Seek moves the iterator to the first key/value pair whose key is greater
	// than or equal to the given key.
	// It returns whether such pair exist.
	//
	// It is safe to modify the contents of the argument after Seek returns.
	Seek(key []byte) bool

Iterator.Seek will return the first key/value pair whose key is greater than or equal to the given key.

Oh, I made a mistake and you're right. The default behavior of db.seek is seek greater than or equal to in leveldb. In the PR, we use the seekLT in pebble which impl seeking the last key lower than the given key.

// SeekLT moves the iterator to the last key/value pair whose key is less than
// the given key. Returns true if the iterator is pointing at a valid entry and
// false otherwise.
func (i *Iterator) SeekLT(key []byte) bool {
	return i.SeekLTWithLimit(key, nil) == IterValid
}

And in pathdb, it is impossible to have a pathkey that is the same as a 32-bit hash, so the last key lower than it must be what we are looking for.

fynnss avatar May 15 '24 05:05 fynnss