op-analytics icon indicating copy to clipboard operation
op-analytics copied to clipboard

[Draft] Onchain Contract Usage Models (App Layer prereqs)

Open MSilb7 opened this issue 1 year ago • 4 comments

Problem

To accurately evaluate contract usage, we want to look in to all function calls that happen within a transaction. This is the "traces" dataset. Function calls (traces) gives us insight in to the building block contracts (i.e. DEX that calls a token and an oracle) and underlying contracts called by custom contracts (i.e. trading bots). If we only looked at "transaction" level data, we would only be able to see the first contract called, which would cause us to lose this granularity and make incorrect judgements of onchain activity.

It's not controversial to look at traces; however, it introduces infra problems and other nuance:

  1. Traces data tables are much larger than transaction data tables (~20-100x larger); so queries often run long or take up a ton of memory. We need to build a better/smaller data model to support these queries.
  2. There are many possible ways you could attribute usage to each internal contract (i.e. do you split gas, do you count 1 transaction for each contract)
  • Gas: You could: 1. Attrbute all of a transactions' gas to a contract, 2. Get the gas used by each function call, by subtracting sub-traces, 3. equally amortize all gas used across each contract or function call - There are likely cases where each makes sense, so we could store each metric in a model
  1. App-level attribution requires uniqueness - There could be multiple contracts from the same app called in a single transaction; so we need to be careful to not over-count

TODO: What we need to build

  • One model - aggregate at the tx & trace from/to/method - see how much this reduces
  • Second - aggregate one at the day level
  • Third - aggregate trace to at the day level Potential dependency on lower latency processing first Ideas:
  • Enriched transaction table?
  • Hourly/Daily aggregate by contract?

Related Items

  • Migrate High-Frequency Bot Logic (TODO: Make Issue)
  • Migrate Likely Duplicate Address Logic (Sybil Input) (TODO: Make Issue)
  • Any other address segmentation/classification concepts

Prior Work - We'd want to be able to trivially recreate all of these with proper data models

Future Work Ideas/Concepts

  • Common From/To calls
  • Most "composable" contracts - what gets called most often? by the largest variety of contracts?
  • Dependency mapping (network map?) - find the core building block contracts that we need (i.e. how much activity relies on oracles? on a stablecoin?)

MSilb7 avatar Nov 22 '24 12:11 MSilb7

Testing on 10 min of Base data: https://app.hex.tech/61bffa12-d60b-484c-80b9-14265e268538/hex/978fdac8-5b5f-4f74-8ad0-bc43ddd5ea8d/draft/logic

  • Hash, Trace From, Trace To, Trace Method: Aggregating at the most granular doesn't solve us much on rows, but it does generate all of the enriched fields we want, so still could be worth doing. We'd primarily use this for deep-diving specific paths.
  • Edit, rather than aggregate, just keep the raw enriched in storage
  • Hash, Trace From, Trace To: For measuring interactions between contracts, cuts down the data by ~50%. We'd use this in overall dashboards & for seeing which contracts call another project's contracts (given #1154 ).
  • Combination of Trace To & Trace To: Most aggregated to get the most common call combinations in a transaction. Used for dashboards & cuts our data down to 1% of the size.
Image

MSilb7 avatar Dec 11 '24 20:12 MSilb7

Open todos / questions:

  • Build the base level enriched transactions view.
  • What's the most efficient method for getting the subtrace gas? Self-Joining? Unioning? Tbd - May test out in Hex / Clickhouse, but other systems may behave differently.

MSilb7 avatar Dec 11 '24 20:12 MSilb7

Edit logic to join to Enriched Transactions: #1160 - rather than rebuilding the formulas

MSilb7 avatar Dec 12 '24 18:12 MSilb7

Thanks for pulling these together!

Went over the logic in Hex and I can already see a lot of the possibilities of using the enriched dataset for deeper analysis and insights. Hash, Trace From, Trace To, Trace Method is probably the best route we can go down with the granularity needed, but I guess most of the people will be query even on a daily aggregation layer that's built on top of this as a intermediate table?

Guess @lithium323 can come in to help in terms of improving query efficiency (batch processing)? And the gas calculation + attribution can be done as a separate template on the trace level and then applied in the main query.

Do you know if there's anywhere we can find good trace method mapping? So far I've only seen transaction method mapping and maybe they're the same thing

chuxinh avatar Dec 13 '24 02:12 chuxinh