[Draft] Onchain Contract Usage Models (App Layer prereqs)
Problem
To accurately evaluate contract usage, we want to look in to all function calls that happen within a transaction. This is the "traces" dataset. Function calls (traces) gives us insight in to the building block contracts (i.e. DEX that calls a token and an oracle) and underlying contracts called by custom contracts (i.e. trading bots). If we only looked at "transaction" level data, we would only be able to see the first contract called, which would cause us to lose this granularity and make incorrect judgements of onchain activity.
It's not controversial to look at traces; however, it introduces infra problems and other nuance:
- Traces data tables are much larger than transaction data tables (~20-100x larger); so queries often run long or take up a ton of memory. We need to build a better/smaller data model to support these queries.
- There are many possible ways you could attribute usage to each internal contract (i.e. do you split gas, do you count 1 transaction for each contract)
- Gas: You could: 1. Attrbute all of a transactions' gas to a contract, 2. Get the gas used by each function call, by subtracting sub-traces, 3. equally amortize all gas used across each contract or function call - There are likely cases where each makes sense, so we could store each metric in a model
- App-level attribution requires uniqueness - There could be multiple contracts from the same app called in a single transaction; so we need to be careful to not over-count
TODO: What we need to build
- One model - aggregate at the tx & trace from/to/method - see how much this reduces
- Second - aggregate one at the day level
- Third - aggregate trace to at the day level Potential dependency on lower latency processing first Ideas:
- Enriched transaction table?
- Hourly/Daily aggregate by contract?
Related Items
- Migrate High-Frequency Bot Logic (TODO: Make Issue)
- Migrate Likely Duplicate Address Logic (Sybil Input) (TODO: Make Issue)
- Any other address segmentation/classification concepts
Prior Work - We'd want to be able to trivially recreate all of these with proper data models
- Function Calls
- Popular Combinations
- Group by transaction method
- Project-Level Deep-Dive (Note: Address-level charts require the address summary model)
- Superchain contract level (maps by transactions)
Future Work Ideas/Concepts
- Common From/To calls
- Most "composable" contracts - what gets called most often? by the largest variety of contracts?
- Dependency mapping (network map?) - find the core building block contracts that we need (i.e. how much activity relies on oracles? on a stablecoin?)
Testing on 10 min of Base data: https://app.hex.tech/61bffa12-d60b-484c-80b9-14265e268538/hex/978fdac8-5b5f-4f74-8ad0-bc43ddd5ea8d/draft/logic
Hash, Trace From, Trace To, Trace Method: Aggregating at the most granular doesn't solve us much on rows, but it does generate all of the enriched fields we want, so still could be worth doing. We'd primarily use this for deep-diving specific paths.- Edit, rather than aggregate, just keep the raw enriched in storage
Hash, Trace From, Trace To: For measuring interactions between contracts, cuts down the data by ~50%. We'd use this in overall dashboards & for seeing which contracts call another project's contracts (given #1154 ).Combination of Trace To&Trace To: Most aggregated to get the most common call combinations in a transaction. Used for dashboards & cuts our data down to 1% of the size.
Open todos / questions:
- Build the base level enriched transactions view.
- What's the most efficient method for getting the subtrace gas? Self-Joining? Unioning? Tbd - May test out in Hex / Clickhouse, but other systems may behave differently.
Edit logic to join to Enriched Transactions: #1160 - rather than rebuilding the formulas
Thanks for pulling these together!
Went over the logic in Hex and I can already see a lot of the possibilities of using the enriched dataset for deeper analysis and insights. Hash, Trace From, Trace To, Trace Method is probably the best route we can go down with the granularity needed, but I guess most of the people will be query even on a daily aggregation layer that's built on top of this as a intermediate table?
Guess @lithium323 can come in to help in terms of improving query efficiency (batch processing)? And the gas calculation + attribution can be done as a separate template on the trace level and then applied in the main query.
Do you know if there's anywhere we can find good trace method mapping? So far I've only seen transaction method mapping and maybe they're the same thing