BlockSci icon indicating copy to clipboard operation
BlockSci copied to clipboard

bad serialization format

Open alexarsh opened this issue 5 years ago • 8 comments

Please provide a clear and concise description of the problem.

When running the parser, getting the following error:

Chain height 594088
reloadChain Done
Done initializing/updating blocksci
Started reload loop
Rebuilding clusters...
100.0% done fetching block headers
Starting with chain of 594089 blocks
Removing 0 blocks
Adding 91 blocks
terminate called after throwing an instance of 'SerializableMap<RawOutputPointer, UTXO>::BadSerializationFormatException'
  what():  Tried to load data with bad serialization format from file /data/blocksci-data2/parser/utxoCache.dat
None
Done running parser

Reproduction Steps

1.) Run the blocksci_parser command in order to parse all the blocks from the beginning.
2.) Rerun the blocksci_parser command in order to update the missing blocks

System Information

Using AMI: yes OS: Ubuntu 16.04.5 LTS BlockSci version: 0.5.0 Blockchain: (e.g., Bitcoin, Bitcoin Cash, Litecoin)
Parser: Disk Total memory: 64 GB

alexarsh avatar Sep 10 '19 15:09 alexarsh

Do you have sufficient disk space left? (you can check with df -h) Did you try reparsing from scratch?

maltemoeser avatar Sep 10 '19 17:09 maltemoeser

@maltemoeser. Yes. I have a lot of disk space left. The problem is that reparsing solves it for specific block, but then it fails on the first block after that. Is there some option to skip the problematic block? If not, maybe the approach should be to write some script that automatically starts reparsing on every such problem?

alexarsh avatar Sep 11 '19 03:09 alexarsh

Similar reports in #147

I haven't observed this myself, not sure what's going on unfortunately

maltemoeser avatar Sep 11 '19 12:09 maltemoeser

Due to the problem, the parser became pretty uselessness. It runs for a few hours maximum and then fail again on some other block with the same error. So I have to reparse all the blockchain from the beginning, switch the data directories and start the reparsing over and over again.

alexarsh avatar Oct 14 '19 08:10 alexarsh

I haven't observed this issue either. In #147, @engenegr suggests that its due to a "weak machine", whatever that boils down to. (The only options are not enough RAM or insufficient disk space.)

This error occurs if the unserialize method of Google's dense hash table returns false, see https://github.com/sparsehash/sparsehash/blob/master/src/sparsehash/internal/densehashtable.h#L1138.

  • Does this issue occur during the initial run of the parser, or just for incremental updates?
  • Did you try using v0.6? Does the error occur there? If this resolves the issue, I doubt that we will further investigate this bug in v0.5.
  • What's your block-num parameter when launching the parser? If you use 0, try -6 to avoid re-orgs.
  • If you could run the parser using a debugger (eg. gdb) and provide a stack trace of the crash, that would be helpful.

However, as this is not-reproducible on our side, it's hard to fix at the moment.

mplattner avatar Nov 11 '19 16:11 mplattner

As I said earlier, this error occurs a few days after initial parsing during incremental updates. For the last half year, I'm reparsing from scratch every week or two, since I get this error pretty fast. Then I start the incremental update, but it fails really quickly. Currently I'm stuck with this error at block 603803. I have 1.5TB disk space and a machine with 64GB RAM, so it's not a resources problem. I tried to work with 0.6 twice, but since it's not an official version, it lacks a lot of documentation and I failed to work with it due to multiple problems/errors. My block-num parameter is -1. I'm familiar with gdb, but can you explain how I run the parser with the debugger?

alexarsh avatar Nov 16 '19 15:11 alexarsh

Update: After I posted the comment below, I saw that in your initial issue description the error happens even when Removing 0 blocks, eg. no block re-org occurs. However, there are output messages that are not part of the official version (reloadChain Done, Started reload loop, Rebuilding clusters... etc.). What revision are you using and does it include any other modifications that might be relevant to this issue?


Since you use -1 for the max-block parameter, an ad-hoc guess is that the error occurs when there is a block reorganization. I wasn't able to reproduce this guess with a small test-chain, but it might fail with the real Bitcoin blockchain. A block reorganization requires to remove already parsed data, and is performed if x is greater than 0 in the line Removing x blocks of the parser output, see an example below.

$ blocksci_parser <arguments>
[...]
Starting with chain of 100 blocks
Removing 2 blocks
Adding 26 blocks
[...]

Please let me know if the issue does indeed (only) occur if already parsed blocks are removed. If this is the case, using a smaller value for the max-block parameter (-6 is the recommended value) and doing a full re-parse should resolve the issue.

Please also provide the complete parser output the next time the error occurs.

mplattner avatar Nov 17 '19 01:11 mplattner

@martinplattnr , Thanks for the detailed answer. The outputs which are not part of the official version are just my own prints. But I use the parser command as is without any changes or customizations. I just run it from my python code like this:

proc = subprocess.run( ["/usr/bin/blocksci_parser", "--output-directory", config["blocksci"]["chain"], "update","--max-block","-1", "disk", "--coin-directory", config["blocksci"]["bitcoin"]], stderr=subprocess.STDOUT, preexec_fn=pre_exec) print(proc.stdout) print("Done running parser")

The errors occur when there is no blocks to remove, so I don't think it's related. The last error is:

Starting with chain of 603803 blocks Removing 0 blocks Adding 440 blocks terminate called after throwing an instance of 'SerializableMap<RawOutputPointer, UTXO>::BadSerializationFormatException' what(): Tried to load data with bad serialization format from file /data/blocksci-data/parser/utxoCache.dat

What do you mean by the "complete parser output"?

alexarsh avatar Nov 17 '19 19:11 alexarsh