reth granular archive sync mode

Describe the feature

The problem we're facing is that Ethereum and other EVM chains have too much data, making it difficult and expensive to host archive nodes. This discourages people from running their own nodes, so they rely on centralized hosting services instead. Additionally, most applications only need certain contract data and don't care about irrelevant tokens or scams contributing to the data bloat.

To solve this, we could apply filters to the historical data and create a granular archive sync mode. This mode would only keep the data for the contracts that users actually need, reducing the storage requirements from 8TB to a more manageable size. The storage can scale based on the project's needs and success, allowing people to run their own nodes on smaller, cheaper machines. In the future, we might even be able to run nodes within web browsers, although that's a distant possibility.

By implementing this filtering and granular archive sync mode, we can optimize storage efficiency and cost while still providing access to the necessary contract data for applications. This gives projects more control over their nodes and infrastructure, promoting a decentralized ecosystem for Ethereum and EVM chains.

dApps typically interact with a specific subset of contracts to support their user interfaces and handle various calls. As a result, there is often no need for these dApps to rely on archive nodes that store the entire blockchain state. Instead, by utilizing a granular archive sync mode or filtering mechanism, dApps can optimize resource usage and streamline operations by syncing and accessing only the essential contracts to their functionality, avoiding needing a node with unnecessary data. This can lead to more people running their own nodes inhouse.

In the context of reth, I propose introducing a configurable granular archive sync mode. This would allow users to customize their node setup according to their specific needs. I welcome any other ideas or feedback on this proposal.

[reth.granular_archive]
contracts = ["0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984"]
eth_transactions = true
from_block = 192

This configuration lets you choose which contracts you want to include in the node's state. If you're interested in tracking Ethereum transactions and monitoring balances and related activities, you can turn on the eth transactions option. Additionally, you can specify the block number from which you want to start caring about the state of the contracts.

It is very normal for protocols such as AAVE to allow new contracts based on a DAO vote. This would be the case for many other protocols as well (aka it changes over time). When this happens the node would need to dynamically resync that contracts state. A new custom reth RPC call called reth_grandularArchiveAddContract accepting a contract address as input would allow you to be able to trigger the process of resyncing that contract on your granular archive node.

This configuration empowers you to tailor the node's behaviour to your needs and interests. Whether you want to focus on certain contracts, monitor transactions, or set a specific starting point, you can customize the granular archive sync mode according to your preferences.

Of course, this is just an idea, but I wanted to dump it here because reth is in active development, and they are super open to new thoughts and ideas. I do not expect this to take priority over the main functionality, but I wanted to start the discussion and get other people's views.

May 19 '23 14:05 joshstevens19

@joshstevens19 thanks for your thoughts on the topic, it's very helpful. We have an issue where full node discussion is happening right now, I think your ideas are related to this issue: https://github.com/paradigmxyz/reth/issues/2629.

What you've proposed looks like some sort of a pruning. We plan to start with a pretty simple configuration of choosing what you want to prune (history, receipts except deposits, tx lookup index, etc.) and the pruning distance (measured in minimum block height or last N blocks).

In the first iteration of pruning, this configuration will have to be defined on the initial sync, as we won't have an ability to backfill the missing data. Later, we plan to add an ability to change the pruning configuration without a resync, so the node would backfill the missing parts as on the initial sync.

Then, we can expand it to something like a "granular archive node", as you've described: full node + parts of the archive node specified by the configuration. Doing it straight away as a set of RPC methods like reth_grandularArchiveAddContract seems to be an overkill for me though, but agree it's a good DX to have, as you wouldn't even need to restart the node and edit any configs.

Jun 21 '23 13:06 shekhirin

awesome super glad this is a topic you guys are working on let me know if i can help in anyway thanks a lot @shekhirin

Jun 21 '23 22:06 joshstevens19

Closing this in favor of #2629

Jun 25 '23 13:06 onbjerg

reth reth copied to clipboard

granular archive sync mode

Describe the feature

reth
reth copied to clipboard