go-ethereum Stateless witness prefetcher changes

Superseeds https://github.com/ethereum/go-ethereum/pull/29035 because OP didn't permit modifications from maintainers...

Apr 12 '24 12:04 karalabe

I'm kind of wondering whether close is needed, rather we should have a wait method which perhaps just ensure everything is loaded

As I see it, the prefetcher needs a couple of phases.

Phase 1: open for scheduling. At this point, it accepts tasks to be fetched. Callers must not (cannot?) retrieve data from it at this point. When an external caller tells it to, it goes into
Phase 2: No longer open for scheduling tasks. At this point, finishes all tasks, and once all tasks are done, it goes into
Phase 3: (again, not open for scheduling tasks) At this point, callers can retrieve data from it.

Perhaps we need something more elaborate than this, but, whatever we need, we would be well served by first jotting down the description in human language; before doing some lock/mutex/channel-based implementation of "something"

Apr 15 '24 10:04 holiman

As I understand the difference between the old and new prefetcher is (should be) as follows:

Old pre-fetcher:
- Purpose is to warm up the trie during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that the hasher will need at the end.
- All operations are async, running in the background, the most important thing is to never ever block. If we get more useful data great, if less, thats life, but we should never hold up execution.
- When execution reaches a boundary (IntermediateRoot pre-Byzantium; or Finalize after Byzantium), insta-terminate all pre-fetchers to avoid the main committer thread from racing for disk accesses. Whatever we managed to load will be used, the rest pulled on demand.
New pre-fetcher:
- Purpose is to act as a witness constructor (write only for now) during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that both the hasher hasher, but also a cross-validator will need at the end.
- Almost all operations are async, running in the background, the most important thing is to never ever block during EVM execution. However, on commit boundaries we have to switch to blocking mode, since the witness needs all data, not just whatever we loaded until that point in time.
- When execution reaches a boundary (IntermediateRoot pre-Byzantium; or Finalize after Byzantium), wait for all pre-fetchers to finish. This will block the main committer thread, but ideally if we're not loading junk, it should be all the same, the data needs to be loaded anyway to commit. For the witness, the data must be loaded, before tries are mutated.
(Threading) Quirks:
- Pre-Byzantium does an IntermediateRoot call between each transaction. A witness pre-fetcher for that block range must support stopping after a transaction, collecting the witness; then continuing against the next transaction, collecting witnesses from updated tries. This is significantly more complex from both a witness and a threading perspective, to have data across tries. Given that pre-Byzantium is ancient, it doesn't make sense to support it, but we need to very explicitly handle / reject that case, otherwise it's going to be "weird" trying to understand the code.
- The old pre-fetcher was best-effort, with no guarantees on correctness (as to how much and what data it loaded). The new pre-fetcher needs to be correct to construct a proper witness, so sometimes blocking is necessary. That however means that code paths need to be re-thought, as we still want to maximise the main EVM execution pathways even whilst waiting for data. Particularly, when terminating a pre-fetcher (i.e. waiting), we should start integrating results from finished subfetchers before waiting for all storage tries to finish loading.
Qustions:
- Does slot mutation order make the witness different? I.e. If i change 3 slots in a contract (including delete/create), does the order of applying them change what trie nodes we need? Because if so, there might be a hidden step still needed during commit to add prefetch tasks (?)

Apr 15 '24 13:04 karalabe

Purpose is to act as a witness constructor (write only for now) during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that both the hasher hasher, but also a cross-validator will need at the end.

It's actually meant to gather witnesses for read values. In the stateless witness builder PR, I gather write witnesses from committing the tries.

But iirc, earlier on the call today you mentioned not tying the retrieval of write witnesses to the commit operation, which would change the assumptions from my original code.

Apr 15 '24 20:04 jwasinger

Screenshot 2024-05-07 at 09 22 54

May 07 '24 06:05 karalabe

go-ethereum go-ethereum copied to clipboard

Stateless witness prefetcher changes

go-ethereum
go-ethereum copied to clipboard