tracing: investigate persistence
Live tracing as the chain progresses in one thing, but what do we do with the results? We can store it in our database or a freezer instance, but that's going to be a very non trivial amount of data. Whilst we can argue that anyone running traces should know about it, but it has some implications:
- We don't want DApps to end up forcing users to run with some tracers enabled because they rely on it.
- Keeping the traces seems a lot less optimal vs having it post-processed by some external indexer and then discarded.
There are of course advantages in persisting the traces, but it's forming a can of worms with indexing, deleting, querying, etc; and Geth needs to walk a fine line in what it adds vs what it outsources/delegates.
Here is a proposal for storing tracer outputs:
For each (named) tracer implementation, we keep two freezer instances:
- The 'events' freezer contains output events generated by the tracer.
- The 'block-id' freezer contains the offset into the 'events' table at each block height.
In the API provided to the tracer, we provide a function to emit an event Emit(data any) uint64. Whenever Emit is called, we append a new record to the freezer database of the tracer and return the index of the item.
At every block boundary, we also record the latest event index. This is done to facilitate reorgs: when a reorg happens, we first look up the event index for the common parent of the chains. The 'events' freezer is then truncated to that event before processing of the new chain branch continues.
For users, we provide APIs to read the events in two ways:
tracer_getEvents(name, index, n) will return n items from the freezer table. We can limit n to a reasonable number. The return value should also contain the last index to allow paging through the results.
tracer_getEventsFromBlock(name, block, n) also returns n events, but starts at a block number. This method should utilize the 'block-id' freezer to find the index, then internally invoke getEvents.