reth icon indicating copy to clipboard operation
reth copied to clipboard

feat: implement search_transactions_before and search_transactions_after

Open caglaryucekaya opened this issue 11 months ago • 19 comments

Will close #13499

caglaryucekaya avatar Jan 02 '25 23:01 caglaryucekaya

@mattsse I have some questions. In the issue you wrote that we should enforce a page size limit like 100 blocks. However, according to the documentation the page size argument is not blocks but the number of transactions, so we can put a limit to that.

One problem with my current implementation is that it processes blocks one by one until all the traces are fetched because we don't know in advance in which block we are going to reach the page_size number of transactions. The problem is that since the blocks are not processed in parallel, if the user e.g. request the search beginning from the genesis block and the searched addresses appear far later, the search can take hours.

I will also look into your suggestion to use AccountReader for optimization.

caglaryucekaya avatar Jan 03 '25 00:01 caglaryucekaya

@mattsse I have some questions. In the issue you wrote that we should enforce a page size limit like 100 blocks. However, according to the documentation the page size argument is not blocks but the number of transactions, so we can put a limit to that.

One problem with my current implementation is that it processes blocks one by one until all the traces are fetched because we don't know in advance in which block we are going to reach the page_size number of transactions. The problem is that since the blocks are not processed in parallel, if the user e.g. request the search beginning from the genesis block and the searched addresses appear far later, the search can take hours.

I will also look into your suggestion to use AccountReader for optimization.

Note the gotcha from the Otterscan RPC API docs on this one:

There is a small gotcha regarding pageSize. If there are fewer results than pageSize, they are just returned as is.

But if there are more than pageSize results, they are capped by the last found block. For example, let's say you are searching for Uniswap Router address with a pageSize of 25, and it already found 24 matches. It then looks at the next block containing this address's occurrences and there are 5 matches inside the block. They are all returned, so it returns 30 transaction results. The caller code should be aware of this.

And feel free to reference the Anvil implementation, which was correct (at least the last time I went through the code): https://github.com/foundry-rs/foundry/blob/master/crates/anvil/src/eth/otterscan/api.rs

The default Otterscan page size is 25, so make sure not to set a limit below that.

sealer3 avatar Feb 11 '25 14:02 sealer3

Note the gotcha from the Otterscan RPC API docs on this one:

I was aware of the gotcha and implemented accordingly.

And feel free to reference the Anvil implementation, which was correct (at least the last time I went through the code): https://github.com/foundry-rs/foundry/blob/master/crates/anvil/src/eth/otterscan/api.rs

The default Otterscan page size is 25, so make sure not to set a limit below that.

I wasn't aware of the Foundry implementation, I will check it and make changes if necessary. And I will use a limit above 25 transactions. Thank you very much!

caglaryucekaya avatar Feb 12 '25 19:02 caglaryucekaya

@sealer3 Do you know why the Anvil implementation only considers post-fork blocks?

caglaryucekaya avatar Feb 13 '25 11:02 caglaryucekaya

According to the original pull request https://github.com/foundry-rs/foundry/pull/5414, the stated reason is to be able to iterate through all blocks in memory for ots_searchTransactionsBefore, ots_searchTransactionsAfter, ots_getContractCreator, and other Otterscan RPC methods that would otherwise require special indexing from the node.

Since Anvil is designed to work on any backend node (for forking), it would make sense for them not to have to make potentially thousands of RPC requests to a backend node that might be far away on a remote machine to gather the required information for a response.

sealer3 avatar Feb 13 '25 13:02 sealer3

@mattsse I also added the implementation for search_transactions_before. This is waiting for your review now.

caglaryucekaya avatar Feb 14 '25 17:02 caglaryucekaya

Hey @caglaryucekaya ! Whats the status of this PR ? Do you need any support / Are you stuck?

jenpaff avatar Mar 03 '25 11:03 jenpaff

Hey @jenpaff ! It's waiting for a review, @mattsse said it needs some checks.

caglaryucekaya avatar Mar 03 '25 12:03 caglaryucekaya

First, awesome work @caglaryucekaya! I tried using this PR with ots_searchTransactionsAfter but stumbled upon the fact that logsBloom is optional. This prevents me from parsing it into a normal receipt using Python web3. But this works for me with Erigon. It looks like there is no "special OtsReceipt" in Erigon anymore? Maybe there was some type like that in the past. See Erigon's searchTransactionsAfterV3 return type for current implementation. The Otterscan readme sadly does not go in detail here.

I would leave the PR as it is, and maybe someone will eventually sync the structs with the RPC that Erigon provides for the ots_searchTransactionsBefore/After method to have the same result.

Nevertheless, I thought I’d leave a comment for others to find. For now, making it default works for me with logs_bloom: Some(Bloom::default()).

cakevm avatar Mar 05 '25 13:03 cakevm

@cakevm Thanks for informing! If setting the logs_bloom to default helps, I can make that change in this PR.

caglaryucekaya avatar Mar 05 '25 19:03 caglaryucekaya

It if is not too much effort, I would appreciate that!

cakevm avatar Mar 05 '25 19:03 cakevm

@cakevm Done!

caglaryucekaya avatar Mar 05 '25 21:03 caglaryucekaya

Does anyone know the status of this PR? I would love to run otterscan on top of our reth nodes instead of having to run an extra erigon node just for otterscan.

binarynightowl avatar Apr 08 '25 18:04 binarynightowl

Yesterday I messed with the work @caglaryucekaya has done on this in v1.3.8 locally as a rpc extension. The implementation works for the most part, there is some changes to the erigon_getHeaderByNumber api that need to made to get it to work, as well as some methods that have been removed or renamed that need to be updated.

@caglaryucekaya Do you still want to work on this?

If not @mattsse can I work on a revamp of this to get it merged in? I think otterscan compatibility would be great for reth.

Also the ots_searchTransactionsBefore method seems quite slow so we might see if we can optimize that better. When running it, CPU is consumed to 100% of all 32 cores of my node for a good 20 seconds.

binarynightowl avatar Apr 10 '25 14:04 binarynightowl

@binarynightowl The PR is waiting for a review from @mattsse. Yes I still want to work on the PR. Did you encounter any problems while using the search_transactions_before or search_transactions_after? If so, you can write it here so I can push a fix. For the methods that need to be renamed or updated, we should open new issues and PRs.

Yes the methods search_transactions_before and search_transactions_after are unfortunately CPU-intensive. The chain is scanned until the given occurences of the addresses are found, which might mean processing hundreds of thousands of blocks or even the whole chain. As an optimization, the implementation processes batches of 1000 blocks concurrently to speed up the process. It can of course be further optimized and I am open to ideas.

caglaryucekaya avatar Apr 10 '25 18:04 caglaryucekaya

@binarynightowl The PR is waiting for a review from @mattsse. Yes I still want to work on the PR. Did you encounter any problems while using the search_transactions_before or search_transactions_after? If so, you can write it here so I can push a fix. For the methods that need to be renamed or updated, we should open new issues and PRs.

Yes the methods search_transactions_before and search_transactions_after are unfortunately CPU-intensive. The chain is scanned until the given occurences of the addresses are found, which might mean processing hundreds of thousands of blocks or even the whole chain. As an optimization, the implementation processes batches of 1000 blocks concurrently to speed up the process. It can of course be further optimized and I am open to ideas.

Yeah that makes sense. What about a simple binary search where it can reduce the search space needed to find the right number of transactions or something like that?

Hopefully we can get this merged soon, it would be great to support otterscan in reth!

binarynightowl avatar Sep 11 '25 16:09 binarynightowl

sorry, really dropped the ball on this one -.-

marking this for inclusion in next release. working on the broken changes next and some touchups

mattsse avatar Sep 18 '25 13:09 mattsse

sorry, really dropped the ball on this one -.-

marking this for inclusion in next release. working on the broken changes next and some touchups

Sweet looking forward to that! Let me know if you need any help on that, I have a custom node with overrides to implement these two RPC methods on 1.7.0 that I can get moved up to 1.8.1 if needed

binarynightowl avatar Sep 23 '25 17:09 binarynightowl

Picking this back up again to fix up this pr

yongkangc avatar Sep 25 '25 08:09 yongkangc