bitcoin-to-neo4j icon indicating copy to clipboard operation
bitcoin-to-neo4j copied to clipboard

Can you share the ~600GB graph database in Torrent?

Open brunoaduarte opened this issue 6 years ago • 21 comments

The resulting Neo4j database is roughly 6x the size of the blockchain. So if the blockchain is 100GB, your Neo4j database will be 600GB. It may take 60+ days to finish importing the entire blockchain.

Can you share the ~600GB graph database in Torrent?

brunoaduarte avatar Dec 27 '18 13:12 brunoaduarte

Up

noff avatar Jan 03 '19 11:01 noff

I tried setting up a torrent before but couldn't it working for some reason and haven't tried again since. What would be the use? Would have to dedicate some time to setting up and hosting the torrent (and figuring out why it didn't work last time, ha).

I have the database running on a server that is accessible through the web browser, if that's any use.

in3rsha avatar Jan 06 '19 15:01 in3rsha

I'm trying to make a search for fraudulent transactions. I think I can host the torrent if you can share the actual database. Because my server will import BTC for a few months. Can you make a backup and share it somewhere? For example, I can give you an SSH for test server so you will able to upload it.

noff avatar Jan 07 '19 05:01 noff

Okay I see. The database is about 1TB so I think a torrent would be the best way to share it. If I can find some free time I'll look in to setting up a torrent.

in3rsha avatar Jan 07 '19 15:01 in3rsha

Maybe I can assist you to configure torrent to save your time?

noff avatar Jan 14 '19 11:01 noff

I did some testing here on the "data/databases/graph.db/neostore.transaction.db.X" files. Using .RAR i got a 15% compression ratio. So the 1TB data compressed to RAR will have a final size of around 150~200 GB.

Creating a new Torrent is very easy, just download and install uTorrent Classic Press CTRL+N to open the new torrent dialog, select the compressed database RAR file and click CREATE. It will imediatelly start seeding. Then you just copy the magnet URI and paste it here :)

For example, this is the one i've just created

image

magnet:?xt=urn:btih:CECCD44A424A6F541373C38D90300DAD68A16A4E&dn=graph.db.rar&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce

In the meantime i'm running "bitcoin-to-neo4j" on my blocks folder. 12 hours and i'm on height 108,651 / 561,747. As per your 60-day estimative, i guess importing speed will slowdown exponentially, right?

Thanks

brunoaduarte avatar Feb 06 '19 04:02 brunoaduarte

Yes, I’m importing it 2 months and now on height 252000.

noff avatar Feb 06 '19 05:02 noff

Yes, I’m importing it 2 months and now on height 252000.

Which size already ?

brunoaduarte avatar Feb 06 '19 10:02 brunoaduarte

Only 66Gb.

noff avatar Feb 06 '19 11:02 noff

Only 66Gb.

@noff , could you run this command that will show how long each of the .dat files took to be processed and paste the result here please?

redis-cli hgetall bitcoin-to-neo4j:log

brunoaduarte avatar Feb 06 '19 19:02 brunoaduarte

  2) "[79836] 974.81 mins"
  3) "blk00001.dat"
  4) "[11249] 594.20 mins"
  5) "blk00002.dat"
  6) "[5706] 583.93 mins"
  7) "blk00003.dat"
  8) "[5712] 612.38 mins"
  9) "blk00004.dat"
 10) "[6399] 659.58 mins"
 11) "blk00005.dat"
 12) "[7457] 653.84 mins"
 13) "blk00006.dat"
 14) "[7236] 615.13 mins"
 15) "blk00007.dat"
 16) "[6210] 598.06 mins"
 17) "blk00008.dat"
 18) "[6145] 616.54 mins"
 19) "blk00009.dat"
 20) "[3954] 634.31 mins"
 21) "blk00010.dat"
 22) "[1513] 658.11 mins"
 23) "blk00011.dat"
 24) "[1544] 520.79 mins"
 25) "blk00012.dat"
 26) "[1377] 460.50 mins"
 27) "blk00013.dat"
 28) "[1079] 588.11 mins"
 29) "blk00014.dat"
 30) "[1797] 650.83 mins"
 31) "blk00015.dat"
 32) "[1856] 648.95 mins"
 33) "blk00016.dat"
 34) "[1393] 636.46 mins"
 35) "blk00017.dat"
 36) "[1547] 666.84 mins"
 37) "blk00018.dat"
 38) "[1534] 724.78 mins"
 39) "blk00019.dat"
 40) "[1188] 685.00 mins"
 41) "blk00020.dat"
 42) "[1530] 726.96 mins"
 43) "blk00021.dat"
 44) "[1333] 668.53 mins"
 45) "blk00022.dat"
 46) "[1510] 644.35 mins"
 47) "blk00023.dat"
 48) "[1600] 513.11 mins"
 49) "blk00024.dat"
 50) "[1389] 494.09 mins"
 51) "blk00025.dat"
 52) "[1341] 641.53 mins"
 53) "blk00026.dat"
 54) "[1281] 570.26 mins"
 55) "blk00027.dat"
 56) "[1767] 548.89 mins"
 57) "blk00028.dat"
 58) "[1439] 607.54 mins"
 59) "blk00029.dat"
 60) "[1193] 612.29 mins"
 61) "blk00030.dat"
 62) "[1369] 614.33 mins"
 63) "blk00031.dat"
 64) "[1177] 595.22 mins"
 65) "blk00032.dat"
 66) "[923] 517.29 mins"
 67) "blk00033.dat"
 68) "[465] 304.85 mins"
 69) "blk00034.dat"
 70) "[1187] 607.02 mins"
 71) "blk00035.dat"
 72) "[1064] 616.63 mins"
 73) "blk00036.dat"
 74) "[820] 616.93 mins"
 75) "blk00037.dat"
 76) "[829] 558.56 mins"
 77) "blk00038.dat"
 78) "[848] 549.05 mins"
 79) "blk00039.dat"
 80) "[890] 516.38 mins"
 81) "blk00040.dat"
 82) "[873] 628.13 mins"
 83) "blk00041.dat"
 84) "[796] 634.02 mins"
 85) "blk00042.dat"
 86) "[954] 661.90 mins"
 87) "blk00043.dat"
 88) "[857] 562.21 mins"
 89) "blk00044.dat"
 90) "[829] 535.56 mins"
 91) "blk00045.dat"
 92) "[762] 530.25 mins"
 93) "blk00046.dat"
 94) "[753] 527.93 mins"
 95) "blk00047.dat"
 96) "[786] 540.58 mins"
 97) "blk00048.dat"
 98) "[1197] 533.25 mins"
 99) "blk00049.dat"
100) "[960] 474.44 mins"
101) "blk00050.dat"
102) "[739] 457.02 mins"
103) "blk00051.dat"
104) "[796] 481.15 mins"
105) "blk00052.dat"
106) "[717] 499.94 mins"
107) "blk00053.dat"
108) "[746] 562.37 mins"
109) "blk00054.dat"
110) "[809] 576.79 mins"
111) "blk00055.dat"
112) "[844] 583.04 mins"
113) "blk00056.dat"
114) "[814] 532.44 mins"
115) "blk00057.dat"
116) "[777] 509.30 mins"
117) "blk00058.dat"
118) "[838] 504.36 mins"
119) "blk00059.dat"
120) "[726] 515.63 mins"
121) "blk00060.dat"
122) "[684] 508.69 mins"
123) "blk00061.dat"
124) "[815] 520.18 mins"
125) "blk00062.dat"
126) "[878] 509.49 mins"
127) "blk00063.dat"
128) "[922] 513.50 mins"
129) "blk00064.dat"
130) "[985] 510.51 mins"
131) "blk00065.dat"
132) "[1095] 562.51 mins"
133) "blk00066.dat"
134) "[1058] 545.18 mins"
135) "blk00067.dat"
136) "[1055] 594.48 mins"
137) "blk00068.dat"
138) "[740] 426.48 mins"
139) "blk00069.dat"
140) "[520] 447.56 mins"
141) "blk00070.dat"
142) "[1170] 909.89 mins"
143) "blk00071.dat"
144) "[1271] 901.13 mins"
145) "blk00072.dat"
146) "[1195] 892.13 mins"
147) "blk00073.dat"
148) "[1094] 906.22 mins"
149) "blk00074.dat"
150) "[1160] 936.58 mins"
151) "blk00075.dat"
152) "[890] 1,011.22 mins"
153) "blk00076.dat"
154) "[918] 1,355.98 mins"
155) "blk00077.dat"
156) "[888] 1,182.47 mins"
157) "blk00078.dat"
158) "[1135] 1,253.56 mins"
159) "blk00079.dat"
160) "[968] 1,759.61 mins"
161) "blk00080.dat"
162) "[1166] 1,879.71 mins"```

noff avatar Feb 07 '19 07:02 noff

You said you're running this 2 months now and it imported 80 of the 1518 blk files If it keep this average importing speed (which it will probably not) it will take what, ~4 years to finish ?

brunoaduarte avatar Feb 07 '19 13:02 brunoaduarte

Looks like this. We are now digging in fast-import direction by CSV.

noff avatar Feb 07 '19 14:02 noff

@noff 's log

  1. "blk00000.dat" "[79836] 974.81 mins"
  2. "blk00001.dat" "[11249] 594.20 mins"
  3. "blk00002.dat" "[5706] 583.93 mins"
  4. "blk00003.dat" "[5712] 612.38 mins"
  5. "blk00004.dat" "[6399] 659.58 mins"
  6. "blk00005.dat" "[7457] 653.84 mins"
  7. "blk00006.dat" "[7236] 615.13 mins"

Here's my log, HDD is really very slow, but with SSD things speed up a lot... (HDD)

  1. "blk00000.dat" "[119965] 757.15 mins"
  2. "blk00001.dat" "[11259] 480.65 mins"
  3. "blk00002.dat" "[1473] ------- mins" (importing restarted)
  4. "blk00003.dat" "[5726] 540.56 mins"
  5. "blk00004.dat" "[6392] 606.81 mins"
  6. "blk00005.dat" "[7479] 595.06 mins"
  7. "blk00006.dat" "[7214] 573.46 mins"

(SSD)

  1. "blk00000.dat" "[119965] 445.19 mins"
  2. "blk00001.dat" "[11259] 302.57 mins"
  3. "blk00002.dat" "[5697] 293.72 mins"

brunoaduarte avatar Feb 07 '19 23:02 brunoaduarte

We are working now on fast import of initial data via CSV. It can be a solution. I’ve found Go script which imports all blockchain into PostgreSQL during 24 hours. We will use this approach.

noff avatar Feb 08 '19 05:02 noff

We are working now on fast import of initial data via CSV. It can be a solution. I’ve found Go script which imports all blockchain into PostgreSQL during 24 hours. We will use this approach.

Great! Please lets us know how that this develops. Thanks!

brunoaduarte avatar Feb 08 '19 11:02 brunoaduarte

Torrent or fast CSV import would be great. I've got the same problem...

jackenbaer avatar Feb 16 '19 14:02 jackenbaer

As an alternative i found a project that seems to be faster for the initial download... https://github.com/straumat/blockchain2graph After 4 days of parsing (using SSD) i am on block height 328 000 (~310 GB). I recommend using the docker file. @in3rsha Sorry for advertising other projects in the comment section of your project --> I bought you a beer (3Beer3irc1vgs76ENA4coqsEQpGZeM5CTd)

jackenbaer avatar Feb 28 '19 19:02 jackenbaer

@Nehberg, the use blockchain2graph project lead me to a >3TB database size in weeks. His schema leads to a database much bigger, while using the same input. It grows faster, among other things, because of its database schema, I think. I don't have much knowledge in neo4j, but the properties strings db, was more than half of the database size.

Bitcoin-to-neo4j schema is by far smaller, and, as you can see in the browser of this project, faster and efficient.

I would stick to Greg's project and neo4j schema, but using the CSV way. There are implementations that you can google, that was based on Greg's work, that managed to import the blockchain in one day.

arisjr avatar Aug 26 '19 18:08 arisjr

Hello guys, Has someone figured out a way to import the blockchain into CSVs (either by parsing directly or with JSON RPC) in a format that respect Greg's schema? Can't figure it out, thanks ! Greg, absolutely amazing work and great learnmeabitcoin website, very informative thank you.

daniel10012 avatar Aug 12 '20 21:08 daniel10012

Can someone seed for a moment please ? i'll keep the seed open after

Could this project be used for importing the history through csv ? https://github.com/behas/bitcoingraph (sorry for advertising other projects)

bbm-design avatar Dec 30 '20 12:12 bbm-design