archives icon indicating copy to clipboard operation
archives copied to clipboard

Project Gutenberg

Open davidar opened this issue 10 years ago • 15 comments

The first thing I mirrored to IPFS was a small subset of Project Gutenberg, so I'm definitely interested in getting the whole thing into IPFS, as both @rht (#14) and @simonv3 (https://github.com/simonv3/ipfs-gutenberg) have suggested.

Making an issue to coordinate this.

davidar avatar Sep 29 '15 08:09 davidar

This is just an rsync away, really. Currently running it on pollux.

rht avatar Sep 29 '15 12:09 rht

@rht is there enough free disk space on Pollux?

davidar avatar Sep 29 '15 13:09 davidar

(didn't check)

rht avatar Sep 29 '15 14:09 rht

https://www.gutenberg.org/wiki/Gutenberg:Mirroring_How-To says it is at least 650 GB (could have been doubled). Pollux has 13 GB left.

But anyway, the mirroring is a one-liner.

rht avatar Sep 29 '15 14:09 rht

@rht Yeah, what makes this difficult is the amount of disk space - I don't think many people have that amount of space lying around for this.

It's been suggested by some people to shard the collection and just make sure people hosting those bits keep their things in sync independently. There's also been talk about this tool: https://github.com/ipfs/notes/issues/58

simonv3 avatar Sep 29 '15 18:09 simonv3

We could also just pitch in x amount for an Amazon instance (or some other host) of that amount, and just pay that?

Or I could see if I can figure out my raspberry pi, and attach a TB to it.

simonv3 avatar Sep 29 '15 18:09 simonv3

Hmm, rsync doesn't have seek so at least the first 'download -> hash' needs the TB storage to contain it.

Either

  1. https://aws.amazon.com/s3/reduced-redundancy/ ~$24/month.
  2. http://www.amazon.com/Green-1TB-Desktop-Hard-Drive/dp/B006GDVREI ~$50 (can be repurposed for other archivals, once the PG hash has been sharded).

For now, to do partial backup, ipfs object get can be used for each of the nodelinks that form parts of the root hash.

rht avatar Sep 30 '15 03:09 rht

(and both storage came from amazon)

rht avatar Sep 30 '15 03:09 rht

ipfs check-redundancy $hash would be useful.

rht avatar Sep 30 '15 03:09 rht

@jbenet @lgierth SEND MORE DISKS...

Also see ipfs/infrastructure#89

davidar avatar Sep 30 '15 03:09 davidar

ipfs check-redundancy $hash would be useful.

@rht Yeah, what I really want to do is have a "click to pin" button on the archive homepage, people select how much storage they want to donate, and the tool randomly selects an appropriate subset of the least-redundant blocks and pins them to the local daemon.

CC: @whyrusleeping

Edit: see ipfs/notes#54

davidar avatar Sep 30 '15 03:09 davidar

that would be cool. could have our service enumerate providers for each block under a given archive root, then assign blocks with the least number of providers to the next person who requests.

whyrusleeping avatar Sep 30 '15 06:09 whyrusleeping

Should be normalized based on the blocks demand curve.

rht avatar Sep 30 '15 06:09 rht

  • ipfs-cluster - https://github.com/ipfs/notes/issues/58

jbenet avatar Sep 30 '15 06:09 jbenet

We can get more storage nodes, if necessary

jbenet avatar Sep 30 '15 06:09 jbenet