trueblocks-core
trueblocks-core copied to clipboard
chifra export creates huge number of abi files when scraping UniSwap contracts
No need to respond. Just to inform you. I've been running Uniswap extraction for about the last five days. It's stunningly slow. But I figured out why.
I run it with chifra export --articulate which looks at every to address in the transaction history (of which there are 4,5 million). Many, many of those to addresses are smart contracts and many of those have ABIs on Etherscan.
Because I have --articulate on, the export routine checks Etherscan for the ABI (or finds it locally) and uses it to articulate.
There were 250,000 abi files in the folder.
When I moved the folder elsewhere (so there were none) the previously frozen solid chifra export routine immediately sped up to normal.
I'll write an issue, but I just wanted to let you know. Nothing to worry about. But it may be the next candidate for the gRPC server.
We ran an export function against Uniswap v2 router. In the first 3,364 extracted transactions it created around 2,400 separate ABI files with duplicates on the following functions:

This is a result of the idiocy of decentralized data. There is no place to centralize ABI four-bytes. Unchained Index can solve this.
Check to see if we are using the binary known.bin file which is intended to be a short circuit.
I did a lot of work on this and bumped into two significant problems:
- The ABI "cache" was the first one we wrote. It does not use all the regular caching stuff in the types package. This means there is a very large amount of "duplicated functionality but just as complicated and not automated" code that needs to be removed, and
- The age-old issue of conflicting four-bytes. I think we do 'first-in-wins' but we may do 'last-in-wins'. In either case, it doesn't work. The original idea was to load known ABIs and if a conflict happens (for example, if there's a conflict on a later address) we overlay. But now, that overlay applies to all future addresses (if we check four-byte first). If we check address first, we get hundreds of thousands of entries as per this issue.
I'm going to merge the PR representing this preliminary work, but leave this issue open.
https://github.com/TrueBlocks/trueblocks-core/pull/3403