trueblocks-core chifra export: poor performance extracting statements

time chifra export --fmt json --accounting --articulate 0x6e4c6D9B0930073e958ABd2ABa516b885260b8Ff

First time export is very slow:

Executed in  107,05 mins   fish           external
   usr time   32,83 mins  678,00 micros   32,83 mins
   sys time   47,20 mins   77,00 micros   47,20 mins

At the time of running it, the index reported 35571 transactions, which is not that many (TM).

This is first time export (no cache) against local erigon node

Most of the time is spent in reconciliation judging by the output while this is running.

Can anyone run this for me and time it?

Jun 10 '22 10:06 artur-jablonski

3205698287 ( 7437)- <PROG> : Completed 35492 of 35492 txs for address 0x6e4c6d9b0930073e958abd2aba516b885260b8ff

Executed in   67.06 mins   fish           external
   usr time   16.46 mins  1094.00 micros   16.46 mins
   sys time   37.42 mins    0.00 micros   37.42 mins

Jun 10 '22 13:06 nazt

3205698287 ( 7437)- : Completed 35492 of 35492 txs for address 0x6e4c6d9b0930073e958abd2aba516b885260b8ff
Executed in   67.06 mins   fish           external
   usr time   16.46 mins  1094.00 micros   16.46 mins
   sys time   37.42 mins    0.00 micros   37.42 mins

Thx!. So you're a bit faster, but still far from fast.... My SSD is connected via USB-C so it could explain the difference I suppose if your drive is plugged in directly to your mainboard. I mean this is speculation but I am assuming such setup would give faster IO.... is this how you have it setup out of curiousity?

EDIT: or it could be that you didn't blow away your cache. I removed everything from cache folder just to see what's the performance when there's no cache involved.

EDIT2: also it could be that you have more solid connection for pulling ABIs from etherscan since this run involves articulation

Jun 10 '22 13:06 artur-jablonski

One inefficiency I found in the code is this:

https://github.com/TrueBlocks/trueblocks-core/blob/c0fbc744d51e6a6935f133f89ec50f800af27f5c/src/apps/acctExport/handle_accounting.cpp#L201-L211

This method is generating a map of ERC20 tokens from logs of single transaction. On line 204 the findToken method will go against a static list of tokens to see if there's match. The static token list is defined in src/other/install/names/names.tab file in the repo. On line 208 the address will be added if it was found OR if transaction.hasToken flag is set to true.

The transaction.hasToken flag is set here:

https://github.com/TrueBlocks/trueblocks-core/blob/c0fbc744d51e6a6935f133f89ec50f800af27f5c/src/apps/acctExport/process_articulate.cpp#L63-L108

Regardless of whether --articulate flag is passed it will set the hasToken flag to true if either the call of transaction matches one or ERC20 functions or if ANY event/log in transaction matches any of the ERC20 events.

So going back to the logic of adding ERC20 token to the map from previous snippet this means that if there's any match on ERC20 events/call anywhere in the transaction then ALL addresses from logs will be added as tokens.

What will happen then is that for every such address there willl be an attempt to call balanceOf ERC20 function on the contract but since many of those added addresses are not actually ERC20s, the calls will result in error. The response handling code seems to be ignoring errors and just defaulting to 0 if call failed so it will not blow up anything but it will result in many unneccesary calls. Erigon's rpc-deamon actually prints those out:

WARN[06-09|00:42:22.819] Served                                   conn=127.0.0.1:57848 method=eth_call reqid=169 t=2.305556ms err="execution reverted"

So I went and changed the condition of adding token to the map like this:

if (isToken || (log.topics.size() != 0 && log.topics[0] == "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"))
            tokenList[log.address] = tokenName;

This will only add token to map if it was found in the static file OR if the log contains Transfer(address from, address to, uint256 value) event. This should be really the only event that we care about since we're hunting for value transfers. Other events are of no interest.

So cleared the cache and ran the same export again. I verified I got the same output as before and the times reported are:

Executed in   68,39 mins   fish           external
   usr time   25,63 mins    0,00 millis   25,63 mins
   sys time   35,46 mins   30,83 millis   35,46 mins

So it looks slightly better (around 36%).

I still see some execution-reverted logs from Erigon... though it seems there's way less of them (eye ball test). It could be explained if there's a contract that is not ERC20 but for some reason emits Transfer(address from, address to, uint256 value) event. OR if it's defined in the static file but is not a token.....

I found one such occurence in names.tab there's this entry:

50-Tokens:ERC20 0x7be8076f4ea4a4ad08075c2508e481d6c946d12b Project Wyvern Exchange On chain 18 false false false true true false

And in the data this address appears in logs for transaction with hash 0xdac94d78ba9c805ef60f46e1f7ed690b012726afe9c6348c36943949dd48acd3:

{
        "address": "0x7be8076f4ea4a4ad08075c2508e481d6c946d12b",
        "logIndex": 409,
        "topics": [
          "0xc4109843e0b7d514e4c093114b863f8e7d8d9a458c372cd51bfe526b588006c9",
          "0x00000000000000000000000016a7aa64872ff37a456de74b3783fbd4df3bd6d0",
          "0x000000000000000000000000758aa3dbcb6b8d198137e5723a421503a6a390d6",
          "0x0000000000000000000000000000000000000000000000000000000000000000"
        ],
        "data": "0x000000000000000000000000000000000000000000000000000000000000000065925831528f7aafc92d68adb9d126c5a5ceb5e345d3553241c79c79c8dbd6eb0000000000000000000000000000000000000000000000000071afd498d00000",
        "articulatedLog": {
          "name": "OrdersMatched",
          "inputs": {
            "buyHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
            "maker": "0x16a7aa64872ff37a456de74b3783fbd4df3bd6d0",
            "metadata": "0x0000000000000000000000000000000000000000000000000000000000000000",
            "price": "32000000000000000",
            "sellHash": "0x65925831528f7aafc92d68adb9d126c5a5ceb5e345d3553241c79c79c8dbd6eb",
            "taker": "0x758aa3dbcb6b8d198137e5723a421503a6a390d6"
          }
        },
        "compressedLog": "{name:OrdersMatched|inputs:{buyHash:0x0000000000000000000000000000000000000000000000000000000000000000|maker:0x16a7aa64872ff37a456de74b3783fbd4df3bd6d0|metadata:0x0000000000000000000000000000000000000000000000000000000000000000|price:32000000000000000|sellHash:0x65925831528f7aafc92d68adb9d126c5a5ceb5e345d3553241c79c79c8dbd6eb|taker:0x758aa3dbcb6b8d198137e5723a421503a6a390d6}}"
      }

But this contract doesn't have balanceOf function so ¯_(ツ)_/¯

So now I thought maybe all this "articulation" is taking a lot of time since it needs to go to etherscan for abis and do the calculation so as last ran I blew up the cache and then ran this thing again without articulation:

time chifra export --fmt json --accounting 0x6e4c6D9B0930073e958ABd2ABa516b885260b8Ff

the result is:

Executed in   56,37 mins   fish           external 
   usr time   19,25 mins    0,00 millis   19,25 mins 
   sys time   28,57 mins   19,63 millis   28,57 mins

so that's around further 17% shaved off.... still it's an HOUR :thinking:

DISCLAIMER: Of course the timings reported have to be taken with caution as they could be affected by other factors. I haven't ran those processes multiple times to factor out other influences on execution time.

Jun 10 '22 14:06 artur-jablonski

This PR has the change if anyone wants to pick it up and compare performance. This will only have an effect if youre results are not cached already https://github.com/TrueBlocks/trueblocks-core/pull/2167

Jun 10 '22 14:06 artur-jablonski

I have a question:

Does the speed of the system once cached improve? I would think it's very significantly faster.

This is a direct result of us wanting to build something that works on small (desktop/laptop) machines.

Without TrueBlocks, in order to get the list of transactions to query, it quite literally takes months. This is because there is no index on the node. TrueBlocks provides that index.

Once we have a list (which should be super fast -- less than a second for any but the very largest smart contracts), then we query the node for each transaction. This is slow for two reasons: (1) we have to hit the node, and (2) we do this sequentially.

The first thing (hitting the node) is unavoidable because we want to stay small enough to run on small machines.

The second thing (hitting sequentially) is due to our code base being C++. We never made that code concurrent, and we don't plan on doing that. Instead, we are porting to GoLang.

One of the largest part of the port that's still to be done is this exact query. We're getting close.

If it's any consolation, I can say that when we went from C++ to GoLang for the scan of the index, the timing went down from almost 15 seconds for any address to .15 seconds (so 100 times faster) with the Go code.

Hopefully that sort of speed up will be seen when we go to Go for the data extraction.

Jun 17 '22 02:06 tjayrush

Does the speed of the system once cached improve? I would think it's very significantly faster.

Yes, it does. Can't remember exactly and I can't check right now but I think it was still in range of minutes. Way too much to participate in say http request-response cycle, but I will need to double check. I remember I was surprised that even with cache it was still a bit too long considering it's around 30k entries.

This is a direct result of us wanting to build something that works on small (desktop/laptop) machines.

Without TrueBlocks, in order to get the list of transactions to query, it quite literally takes months. This is because there is no index on the node. TrueBlocks provides that index.

Once we have a list (which should be super fast -- less than a second for any but the very largest smart contracts), then we query the node for each transaction. This is slow for two reasons: (1) we have to hit the node, and (2) we do this sequentially.

Sure, and that is brilliant and retreiving the appearances for given address is pretty fast. I understand that you need to hit the node to get the data, but hitting local node should be pretty fast, so I thought that over an hour sounds a bit too long for this amount of data. So I thought either the node is hit unnecessarily which led me to discover the problem that I described here (https://github.com/TrueBlocks/trueblocks-core/issues/2164#issuecomment-1152442993), or/and there's something inefficient in the algorithm somewhere which led me to discover that 40% of time (after my optimisation applied) is spent in qblocks::biguint_t::multiply(qblocks::biguint_t const&, qblocks::biguint_t const&). You can see full gprof output here (https://gist.github.com/artur-jablonski/1f9f50f3f984f9abfc49fadd2b66e2a2).

As for sequential processing then surely parallelizing it will speed things to the point of saturation of the blockchain node (the more parallel queries to the node, the slower it will respond, so there will be some sweet spot for parallelization level here beyond which the performance can actually decrease).

The first thing (hitting the node) is unavoidable because we want to stay small enough to run on small machines.

The second thing (hitting sequentially) is due to our code base being C++. We never made that code concurrent, and we don't plan on doing that. Instead, we are porting to GoLang.

One of the largest part of the port that's still to be done is this exact query. We're getting close.

If it's any consolation, I can say that when we went from C++ to GoLang for the scan of the index, the timing went down from almost 15 seconds for any address to .15 seconds (so 100 times faster) with the Go code.

Hopefully that sort of speed up will be seen when we go to Go for the data extraction.

Sure. I hope my input here can help in understanding deficiencies of the export algorithm so that the implementation in go can take that into consideration.

Jun 17 '22 12:06 artur-jablonski

At the time of running it, the index reported 35571 transactions, which is not that many (TM).

It's actually a lot for a locally running application whose first principal is that it runs on very small machines. As a result of that decision, we keep the data stored inside the node software's database until the user tells us they are interested in the address. We do heavily cache the first query, so second and subsequent queries are as fast (if not faster due to no rate-limiting) than web-speed servers. That's the tradeoff we made -- on purpose. Speed on the first query is less. Runs on small machines is possible. Speed on second and subsequent queries is faster. If you use the monitoring function this "slow first query" is made even that much less noticeable.

Way too much to participate in say HTTP request-response cycle

If you need your app to participate in a request-response cycle (local-first aren't required to do this -- web apps are -- TrueBlocks supports local-first-first), then you can use the pagination features built into the APIs (--first_record, --max_records) as you would any other API. So not sure this comment applies.

but hitting the local node should be pretty fast

It should be, but it's not except for querying logs where it is. Unfortunately, TrueBlocks mixes and matches queries for logs and transactional data under a single query, which could be, I supposed teased out to provide a fast path (of log-only data) and a slower path (of full transactional data including logs). Version 2.0, I think.

I'm going to close this. Parts of this will be handled once we port everything to GoLang.

Nov 07 '22 01:11 tjayrush

trueblocks-core trueblocks-core copied to clipboard

chifra export: poor performance extracting statements

trueblocks-core
trueblocks-core copied to clipboard