BlockSci icon indicating copy to clipboard operation
BlockSci copied to clipboard

Zero-length clusters

Open geofurb opened this issue 7 years ago • 2 comments

Trying to get the size of clusters returns all zero length for all clusters. c.type_equiv_size does not seem to cause this issue. This issue seems to be tied to a behavior where iterating over a cluster takes an exceptionally long amount of time, even for small clusters (e.g. c.type_equiv_size=10). This may be related to #200.

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

Reproduction Steps

import blocksci
import blocksci.cluster

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cm = blocksci.cluster.ClusterManager(BITCOIN_CLUSTER_DIR, chain)

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

System Information

BlockSci version: 0.5 Using AMI: no Compiled under Ubuntu 16.04 cmake version 3.12.4 gcc/g++ 7.3.0-21ubuntu1~16.04 Anaconda version 3.5.1 (Python 3.7.0) Total memory: 64 GB DRAM, 188GB swap

Dependencies installed: blocksci==0.5.0

  • dateparser [required: >=0.6.0, installed: 0.7.0]
    • python-dateutil [required: Any, installed: 2.6.1]
      • six [required: >=1.5, installed: 1.10.0]
    • pytz [required: Any, installed: 2017.2]
    • regex [required: Any, installed: 2018.11.2]
    • tzlocal [required: Any, installed: 1.5.1]
      • pytz [required: Any, installed: 2017.2]
  • multiprocess [required: >=0.70.5, installed: 0.70.6.1]
    • dill [required: >=0.2.8.1, installed: 0.2.8.2]
  • pandas [required: >=0.22.0, installed: 0.23.4]
    • numpy [required: >=1.9.0, installed: 1.13.1]
    • python-dateutil [required: >=2.5.0, installed: 2.6.1]
      • six [required: >=1.5, installed: 1.10.0]
    • pytz [required: >=2011k, installed: 2017.2]
  • psutil [required: >=5.4.2, installed: 5.4.8]
  • pycrypto [required: >=2.6.1, installed: 2.6.1]

geofurb avatar Nov 30 '18 18:11 geofurb

Accessing an individual cluster's addresses takes a very long time and returns an empty list:

IPython console

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cx = blocksci.cluster.ClusterManager(CUSTOM_CLUSTER_DIR,chain)
len(cx.clusters())
Out[7]: 330464891
clist = list(cx.clusters())
a = clist[6]
a
Out[21]: <blocksci.cluster.Cluster at 0x7f17ebced688>
a.addresses
Out[22]: <blocksci.AddressIterator at 0x7efdf84d07a0>
[x for x in a.addresses]
Out[23]: []
a.type_equiv_size
Out[24]: 125

geofurb avatar Nov 30 '18 19:11 geofurb

I've uploaded my bitcoin-data and bitcoin-clusters directories here, in case it helps with reproducing the error. You might want to let that run while you're at lunch; it's a 102 GB download, and when you unzip the *.tar.bz2 (which will also likely take forever), it's something like 170 - 180 GB.

geofurb avatar Dec 01 '18 02:12 geofurb