RFC: Rindexer Structure Overhaul
This ticket is more of a temporary placeholder / request-for-comments whilst a the scope is fleshed out. The goal is to be able to include a higher level of detail required by some more sophisticated use-cases whilst leveraging the simplicity of the black-box that is rindexer.
Super high-level early-stage thoughts here.
Proposal of additional useful info
An example of the kind of additional transaction details that can be useful for different applications, for example:
- To accurately derive native-balance state, we need 'gas price' and 'gas used'
- Application tracking gas statistics across chain, for things like gas price recommendations need this information
- Applications tracking detailed wallet information such as
noncesfor invalidation / more "pro user" metadata. - Indexers being able to sort by transactions intra-block if this is required for some use-case may require tx index, not just log index.
Note: even more information is available, and we can transparently make it all available to rust-code users perhaps, but for the no-code postgres enabled use-case we should be more reserved and opt-in.
struct AdditionalTxDetails {
nonce: String,
gas: String,
max_fee_per_gas: String,
max_priority_fee_per_gas: String,
value: String,
gas_price: String,
transaction_index: String,
block_timestamp: String,
transaction_hash: String // To correlate with the logs themselves
}
How we could get this cheaply for live-indexing
Right now we poll cached_provider.get_latest_block which calls self.provider.get_block(BlockId::Number(BlockNumberOrTag::Latest)) under the hood. If we call .get_block().full() instead we will recieve all of this information essentially for free, it is the same number of CU per call, and effecitvely the same network time.
This means near-zero latency impact on live-indexing for tx timestamp, gas, nonce and more. Assuming a polling period which is less than per-block, we feasibly would never even have to do a "fill in request" for a missed block.. but for correctness purposes it would be important to ensure we query any blocks between Latest and Last Seen to get the metadata for them as well.
+ fn map_block_data(last_seen_block_number: u64, latest_block: Block) -> HashMap<TxHash, AdditionalTxDetails> {
+ if last_seen_block_number + 1 == latest_block.block_number {
+ // Next block, no need to fill in request
+ // ... map over txs in blocks and convert to HashMap
+ } else {
+ // ... fetch any missed blocks (shouldn't happen much) and then map into the HashMap
+ }
+ }
let latest_block = cached_provider.get_latest_block().await;
+ let additional_details: HashMap<TxHash, AdditionalTxDetails> = map_block_data(last_seen_block_number, latest_block)
if let Err(e) = tx.send(Ok(FetchLogsResult {
logs,
from_block,
to_block,
+ additional_details
})) {
How we can get this for backfill ?
Backfills typically are rarer events, it's a once-off cost, however there is some room here to get this for free. For example:
In the case where native-transfers are enabled we must call every single block for all enabled chains anyway, so we already would have this information at no extra cost anywhere native-transfers are enabled (in large part because native-transfers must incur a much larger backfill speed-cost compared with log indexing).
But, that means we could also take advantage of that and attach all these extra details
Imagining an overly-simplified config like:
native_transfers: true # index native token transfers. Already much-much slower backfills for native-transfers specifically.
# and/or
with_additional_details: true # index block timestamps, transaction gas prices, gas used, nonces, and more. At the cost of much slower backfills for every event.
So in these cases we have a process like:
- Change
debug_traceBlockByNumbertoeth_getBlockByNumber [full]since we get everything we need there, and cheaper (20 CU vs 40 CU). There is no benefit for it to be debug from what I can tell, andeth_getBlockByNumberis more widely and consistently supported, and has all the extra timestamp information - Scan the
logsBloommanually in theeth_getBlockByNumberand callgetLogsfor this block if - Abstract the
fetch_logs_streamso that if this "native-transfer" indexing (name can be made more generic) is enabled, we stream logs from eachblockhandled in the native-transfer block-by-block idnexing rather than from theeth_getLogsprocess. This means a slightly more abstracted distinction/modularization needs to be made between:
- The source of event logs
- The processesor/consumer of event logs
This distinction does exist, but might need to be neatened up for this proposal to work.
The other case would be where "native-transfers" are not enabled, in those cases it would potentially be feasible to do a combination of the existing eth_getLog style fast-indexing for much more efficient bloom filter log skipping, but then opt in to the additional_details which would do a multicall of eth_getBlockByNumber for every matching block in the set of logs we get.
More thought required here, I just feel like we're starting to get into more advanced use-cases with debug/trace indexing and block by block indexing with full tx details. Something in the rindexer idnexing process specifically i think needs to be adjusted so these can be switched out optimially
down for this - for sure defo lower in the pile of focus for us now that said we should spec this out for sure as some really good thoughts above