trueblocks-core icon indicating copy to clipboard operation
trueblocks-core copied to clipboard

chifra scrape is extremely slow scraping BSC

Open atdefinative opened this issue 3 years ago • 1 comments

Been trying for the past week to scrape BSC (bscscan.com) My setup is in AWS (i3en.3xlarge - 12 vCPUs, 96GB, locally attached 7500 GB NVMe) saving Trueblocks data to the NVMe's (with zfs and compression) I'm using a local archive+tracing node. It seems most of the time is being spent on staging/finalization. It has been taking 6 hours to scrape 20000 blocks.

Is there any setting or hardware I can change to make this process faster?

Also, I see blockScrape is using only one CPU extensively. Is there any way to spawn multiple blockScrape in parrallal?

chifra scrape command line: chifra scrape run --sleep 1 --block_cnt 20000 --block_chan_cnt 100 --addr_chan_cnt 200 --chain bsc

chifra status:

{
  "clientVersion": "erigon/2022.09.1/linux-amd64/go1.18.1",
  "clientIds": "chainId: 56 networkId: 56",
  "trueblocksVersion": "GHC-TrueBlocks//0.37.1-beta-db0d50a6b-20220901",
  "rpcProvider": "http://localhost:8545",
  "configPath": "***/trueblocks/",
  "cachePath": "***/trueblocks/cache/bsc/",
  "indexPath": "***/trueblocks/unchained/bsc/",
  "host": "***",
  "isScraping": true,
  "isArchive": true,
  "isTracing": true,
  "hasEskey": true,
  "caches": [
    {
      "type": "CIndexCache",
      "path": "***/trueblocks/unchained/bsc/finalized/",
      "nFiles": 23833,
      "nFolders": 8,
      "sizeInBytes": 58426799912,
      "isValid": true
    },
    {
      "type": "CMonitorCache",
      "path": "***/trueblocks/cache/bsc/monitors/",
      "nFiles": 0,
      "nFolders": 1,
      "sizeInBytes": 0,
      "isValid": true
    },
    {
      "type": "CNameCache",
      "path": "***/trueblocks/cache/bsc/names/",
      "nFiles": 2,
      "nFolders": 1,
      "sizeInBytes": 676,
      "isValid": true
    },
    {
      "type": "CAbiCache",
      "path": "***/trueblocks/cache/bsc/abis/",
      "nFiles": 2442,
      "nFolders": 1,
      "sizeInBytes": 18056124,
      "isValid": true
    },
    {
      "type": "CChainCache",
      "path": "***/trueblocks/cache/bsc/blocks/",
      "nFiles": 0,
      "nFolders": 0,
      "sizeInBytes": 0
    },
    {
      "type": "CChainCache",
      "path": "***/trueblocks/cache/bsc/txs/",
      "nFiles": 7790,
      "nFolders": 704,
      "sizeInBytes": 127472890,
      "isValid": true
    },
    {
      "type": "CChainCache",
      "path": "***/trueblocks/cache/bsc/traces/",
      "nFiles": 0,
      "nFolders": 0,
      "sizeInBytes": 0
    },
    {
      "type": "CSlurpCache",
      "path": "***/trueblocks/cache/bsc/slurps/",
      "nFiles": 0,
      "nFolders": 1,
      "sizeInBytes": 0,
      "isValid": true
    },
    {
      "type": "CPriceCache",
      "path": "***/trueblocks/cache/bsc/prices/",
      "nFiles": 0,
      "nFolders": 1,
      "sizeInBytes": 0
    }
  ],
  "chains": [
    ...,
    {
      "chain": "bsc",
      "chainId": 56,
      "symbol": "BNB",
      "rpcProvider": "http://localhost:8545",
      "apiProvider": "http://localhost:8080",
      "remoteExplorer": "https://bscscan.com",
      "localExplorer": "http://localhost:1234",
      "pinGateway": "http://gateway.ipfs.io/ipfs"
    }
  ],
  "date": "2022-09-12 08:19:31 UTC"
}

chifra scrape output for one cycle is attached chifra_scrape_output.log

atdefinative avatar Sep 12 '22 08:09 atdefinative

This is very, very interesting. This is the first time I've seen a scrape against chains other than Mainnet, Sepolia or Gnosis. Let me try to digest a few things first. Please help me by confirming what I think I'm seeing. The log is SUPER helpful.

I see this in the first few lines:

INFO[11-09|21:37:53.668] Sleeping for 1 seconds - 11710015 away from head.
33383838 (    267)- <INFO>  : Block 9560001: have 77476756 addrs of 200000 (38738.4%). Need 0 more. Found 77293731 records (3864.69 txs/blk).

The scraper is 11,710,015 blocks from the latest block - and it's processing block 9,560,001 -- so latest block is around 21,270,016. Is that right?

It also says the at block 9,560,0001 it has 77,476,756of200,000` addrs (this is misreporting -- it should say appearances wanted -- which is 3,873 times more than it needs -- that's very, very strange.

It also says it's getting around 3,864.69 txs per block -- this is misreporting -- it should say appearances per block.

Then it's next few lines are:

110993435 (77609597)- <INFO>  : Writing...                                                                            111401714 ( 408279)- <INFO>  : Wrote 204453 records to $INDEX/finalized/009539954-009540005.bin
209641757 (98240043)- <INFO>  : Writing...                                                                            210131596 ( 489839)- <INFO>  : Wrote 200055 records to $INDEX/finalized/009540006-009540056.bin
310087561 (99955965)- <INFO>  : Writing...                                                                            310514679 ( 427118)- <INFO>  : Wrote 201053 records to $INDEX/finalized/009540057-009540110.bin

which is picking off around 200,000 records and writing them to "chunks" covering a 51 block range, then a 50 block range, then 53 blocks.

(I'm just writing this out so I can refer to it later. There's a number of things we can do.

Couple of questions first:

  1. Which branch are you running? I suspect (but I can't tell) that you're running the master branch, but it might be the develop branch. In either case, I'm going to give you instructions below to switch to a new branch that we're very close to releasing. It's better than both develop and master, including being about twice as fast. It also has numerous bug fixes

  2. It looks like you're running against Erigon. That's good. Question for you: do you know if Erigon supports the trace_ endpoints for BSC?

  3. The number of appearances your "Scrape" is looking for is way too low for the number of appearances your chain is showing. On mainnet, the "writing" happens about once every 2,500 blocks. You're is writing about once every 50 blocks. On Mainnet, we search for 2,000,000 appearances per chunk -- you're setting (which is the default setting is 200,000 -- 10 times smaller than we use on mainnet). We chose 2,500 because that's about twice a day -- which is a direct function of 14 second blocks: 3,000 * 14 / 60 / 60 = 11.6 hours. (Our 2,000,000 appearances target was chosen when that many appearances happened in about 12 hours -- things have changes since). Question: How long are blocks on BSC? You're seeing 200,000 appearances in about 50 blocks -- so if you wanted 2,500 blocks per chunk assuming 14 second blocks, then 2,500 / 50 = 50 or 50 times more appearance than 200,000 or 10,000,000. This setting is called apps_per_chunk and is set in the ~/.local/share/trueblocks/config/<chain>/blockScrape.toml file. (Or mac equivalent).If you set

[settings]
apps_per_chunk=10000000

Unfortunately, this requires you to start over...but before you do that...

  1. Switch to the feature/new-unchained-index-2.0 branch. We're in the very final steps of testing this. The scraper has been pretty thoroughly tested.

Sorry this is so long winded. I want you to please connect with us in Discord and we'll explain much better and I'd be happy to get on a call with you to walk you through it. You're one of the first users we've seen on other chains, so it's super interesting to me to "peer over your shoulder" as you debug this.

Upshot: You're using a slower branch, you're settings are not optimized for a more full chain, you're going to have to start over.

tjayrush avatar Sep 13 '22 02:09 tjayrush

We've decided against support this chain.

tjayrush avatar Apr 21 '23 20:04 tjayrush