trueblocks-core icon indicating copy to clipboard operation
trueblocks-core copied to clipboard

chifra scrape: The trouble with timestamps file and not scraping

Open tjayrush opened this issue 3 years ago • 1 comments

If a user download TrueBlocks, and the first thing they do is

chifra when --timestamps

they will see a spinning text showing the timestamps file is being updated. That's correct - it needs to be updated, but there's at least four problems with this.

  1. The user has to wait
  2. chifra export <address> silently stops its traversal of an account's history at the latest timestamp in that database. This means that a user will not get new transactions
  3. If the user starts chifra when --timestamps and then runs chifra scraper indexer, chifra when may crash and leave the lock file in place
  4. The user has to wait

Currently, the timestamps database is part of the repo (as ts.bin.gz) and it rarely gets updated. Every time the user does make the latest ts.bin.gz gets copied from the repo to the $CONFIG folder (~/.local/share/trueblocks). Whenever the ts.bin is needed anywhere, we check to see if it's present, and if not, we unpack the ts.bin.gz file in place. In this way, a brand new user gets an updated ts.bin and doesn't have to create it from scratch with takes a VERY long time.

The trouble (and the reason for items 1. and 4. above) is that if a user doesn't run a command that keeps the timestamps data up to date, they get no new transactions during export.

Possible Solutions:

If the user is actively scraping, there is no problem If the user is not scraping, then we could update each time the export function is called (slows down return from export a lot which is why this function used to be present in exporter but was removed) We could add the timestamps to the Unchained Index smart contract and download it when users do chifra init

Open to other solutions...

tjayrush avatar Jan 04 '22 16:01 tjayrush

One other note:

blockScrape is the only process that ever writes to the index databases (chunks and blooms), so it need not protect itself from overwriting some other process that is already running (other than itself, which it does).

But...blockScrape also updates the timestamps database which means it has to protect itself from overwriting if it itself is running but also if other processes such as acctExport and whenBlock are writing (and perhaps others).

We used to have very solid code in place that protected against this, but when I recently moved parts of the blockScraper into golang, I think I may have broken this protection.

In the C++ code, we take very careful care to make sure two processes are not writing to the data base at the same same time (that's what the .lck files are for), but this does not seem to be working now as is evidenced by the recent appearance of a .lck file.

tjayrush avatar Jan 04 '22 17:01 tjayrush