trueblocks-core icon indicating copy to clipboard operation
trueblocks-core copied to clipboard

chifra chunks notes

Open tjayrush opened this issue 3 years ago • 0 comments

A discussion about these related tools.

Summary

We would like to consolidate the function of some (or maybe all) of these tools since they all process against the same data type (chunks). The format of a chunk is described in the format section below.

Existing options

chifra chunks (takes optional block identifiers)

--check: check one or more (default to all) chunks for

  • The last block in one chunk is the first block in the next chunk minus 1
  • At every snap_to_grid (100,000 by default) blocks after first_snap (2,250,000 by default on mainnet, zero on other chains) a chunk ends.
  • For non-snapped chunks, nAppsapps_per_chunk. For snapped chunks, nApps < apps_per_chunk.
  • The size of the file is sizeof(header) + sizeof(addr_table) + sizeof(app_table), where
    • header is magic | hash | nAddrs | nApps ( 4 bytes + 32 + 4 + 4 = 44 bytes)
    • addr_table is nAddrs rows of sizeof(addr_record), where
      • addr_record is address | first | count (20 bytes + 4 + 4 = 28 bytes)
    • app_table is nApps rows of sizeof(app_record), where
      • app_record is blockNumber | trans_id (4 bytes + 4 = 8 bytes)
  • Example: 1942904 addresses with 2000054 appearances should be file size of 70401788
    • 1942904 * 28 + 2000054 * 8 + 44
    • 54401312 + 16000432 + 44
    • 70401744 + 44
    • 70401788
  • There is an associated bloom filter for each chunk
  • Possibly: there is an associated Piñata pin for each chunk
  • Possibly: there are no Piñata files that are not chunks
  • Possibly: the manifest hashes to the IPFS hash at the contract

--extract:

write one or more [ header | addr_table | app_table | chunk | bloom ] to the screen.

--save:

used with --extract, write the results to both files with same name as chunk in a folder called ./out.

--stats:

chifra pins

--list: list the contents of the manifest (originally intended to have --remote modify this to also list the pins on Pinata.

~~--init: download the chunks from Pinata using the manifest~~

--share: originally intended to re-pin all pins in the manifest, but creating the zip files on non-Linux machines causes problems, so this is currently disabled.

~~--sleep: needed to avoid getting rate limited but since switching to [ipfs.unchainedindex.io](http://ipfs.unchainedindex.io) this is no longer a problem (at least for us - it may not scale).~~

~~--freshen: Originally intended to do what chifra init does now — freshen with the latest chunks if any since last time run. Not needed.~~

--remote: Originally an option for --list to cause a listing of all chunks on Pinata. (Would be nice for sanity check.

--all: for use with --init only, download not only just the bloom filters (the default) but the actual chunks as well.

chifra init

This tool is a simple alias for chifra pins --init. Only --all option works.

Chunk Related Configuration Items (for blockScrape)

block_cnt - how many blocks for blaze to process at a time

block_chan_cnt - the number of concurrent channels to fetch and process blocks

addr_chan_cnt - the number of concurrent channels to use to process addresses found in already processed blocks

apps_per_chunk - the number of appearances to collect before consolidating into a chunk

snap_to_grid - the block number modulo of which to force a consolidation ignoring the number of appearances — creates a ‘correctable’ collection of chunks

unripe_dist - the distance (in blocks) behind the head of the chain to consider blocks unripe

first_snap - the first block after which a snap will take place

allow_missing - for some chains where there are legitimately no addresses in some blocks (i.e. there is no miner and no transactions) allow zero appearance blocks without complaint

n_test_runs - unused

Current Command Lines

TEST_MODE=true chifra chunks --help

Purpose:
  Manage and investigate chunks and bloom filters.

Usage:
  chifra chunks [flags] <block> [block...]

Arguments:
  blocks - an optional list of blocks to process

Flags:
  -c, --check            check the validity of the chunk or bloom
  -e, --extract string   show some or all of the contents of the chunk or bloom filters
                         One of [ header | addr_table | app_table | chunks | blooms ]
  -s, --stats            for the --list option only, display statistics about each chunk or bloom
  -a, --save             for the --extract option only, save the entire chunk to a similarly named file as well as display
  -x, --fmt string       export format, one of [none|json*|txt|csv|api]
  -v, --verbose          enable verbose (increase detail with --log_level)
  -h, --help             display this help screen

Notes:
  - Only a single block in a given chunk needs to be supplied.⏎

TEST_MODE=true chifra pins --help

Purpose:
  Manage pinned index of appearances and associated blooms.

Usage:
  chifra pins [flags]

Flags:
  -l, --list          list the bloom and index hashes from local cache or IPFS
  -i, --init          download the blooms or index chunks from IPFS
  -a, --all           in addition to Bloom filters, download full index chunks
  -S, --share         share downloaded data by pinning it to IPFS (the IPFS daemon must be running)
  -s, --sleep float   throttle requests by this many seconds (default 0.25)
  -f, --freshen       check for new bloom or index chunks and download if available (hidden)
  -r, --remote        for --list mode only, recover the manifest from IPFS via UnchainedIndex smart contract (hidden)
  -n, --init_all      use --init --all instead (hidden)
  -x, --fmt string    export format, one of [none|json*|txt|csv|api]
  -v, --verbose       enable verbose (increase detail with --log_level)
  -h, --help          display this help screen

Notes:
  - One of --list or --init is required.
  - Re-run chifra init as often as you wish. It will repair or freshen the index.
  - The --share option works only if an IPFS daemon is running.⏎

TEST_MODE=true chifra init --help

Error:
  unknown flag: --help`

Usage:
  chifra init [flags]

Flags:
  -a, --all          in addition to Bloom filters, download full index chunks
  -x, --fmt string   export format, one of [none|json*|txt|csv|api]
  -v, --verbose      enable verbose (increase detail with --log_level)
  -h, --help         display this help screen

Notes:
  - chifra init is an alias for the chifra pins --init command.
  - See chifra pins --help for more information.

chifra init is an alias of chifra pins --init

A Few Things We Need

We need but do not have chifra init --test — display what chunks would be downloaded if —test was omitted. Change nothing on disk.

A better understanding of how chifra status index works and does it provide some of the functionality we need for other things (like --stats) — can we use it to avoid duplicating code?

Format of a Chunk

https://gateway.pinata.cloud/ipfs/Qmart6XP9XjL43p72PGR93QKytbK8jWWcMguhFgxATTya2

Enhancement to Chunks in the Future

During download, chunks could be combined into much larger files, and while this makes them less easy to pin as per the manifest, it would significantly speed up the scan (search) for chifra list.

This would involve first pinning the downloaded file (if that option was on) and then opening the files and rebuilding the Bloom filter wide enough to ensure a given false positive rate. The format of the file wouldn’t have to change at all. These ‘much larger’ blooms could be put on IPFS as well and if the user did chifra init --all they could be downloaded instead of all the smaller files.

There might need to be a cascading larger and larger size depending on how far back away from the tip of the chain we are. Less than 10,000 blocks, chunks as they are. More than 10,000, less than 100,000, chunks are larger, and so on back to 1,000,000 block ranges or even 10,000,000 block ranges depending on what is best.

tjayrush avatar Aug 30 '22 16:08 tjayrush