trueblocks-core icon indicating copy to clipboard operation
trueblocks-core copied to clipboard

chifra export creates huge number of abi files when scraping UniSwap contracts

Open tjayrush opened this issue 2 years ago • 4 comments

No need to respond. Just to inform you. I've been running Uniswap extraction for about the last five days. It's stunningly slow. But I figured out why.

I run it with chifra export --articulate which looks at every to address in the transaction history (of which there are 4,5 million). Many, many of those to addresses are smart contracts and many of those have ABIs on Etherscan.

Because I have --articulate on, the export routine checks Etherscan for the ABI (or finds it locally) and uses it to articulate.

There were 250,000 abi files in the folder.

When I moved the folder elsewhere (so there were none) the previously frozen solid chifra export routine immediately sped up to normal.

I'll write an issue, but I just wanted to let you know. Nothing to worry about. But it may be the next candidate for the gRPC server.

tjayrush avatar Apr 06 '23 22:04 tjayrush

We ran an export function against Uniswap v2 router. In the first 3,364 extracted transactions it created around 2,400 separate ABI files with duplicates on the following functions:

image

This is a result of the idiocy of decentralized data. There is no place to centralize ABI four-bytes. Unchained Index can solve this.

tjayrush avatar Apr 06 '23 22:04 tjayrush

Check to see if we are using the binary known.bin file which is intended to be a short circuit.

tjayrush avatar Nov 17 '23 15:11 tjayrush

I did a lot of work on this and bumped into two significant problems:

  1. The ABI "cache" was the first one we wrote. It does not use all the regular caching stuff in the types package. This means there is a very large amount of "duplicated functionality but just as complicated and not automated" code that needs to be removed, and
  2. The age-old issue of conflicting four-bytes. I think we do 'first-in-wins' but we may do 'last-in-wins'. In either case, it doesn't work. The original idea was to load known ABIs and if a conflict happens (for example, if there's a conflict on a later address) we overlay. But now, that overlay applies to all future addresses (if we check four-byte first). If we check address first, we get hundreds of thousands of entries as per this issue.

tjayrush avatar Nov 20 '23 13:11 tjayrush

I'm going to merge the PR representing this preliminary work, but leave this issue open.

https://github.com/TrueBlocks/trueblocks-core/pull/3403

tjayrush avatar Nov 20 '23 13:11 tjayrush