CommonCrawler icon indicating copy to clipboard operation
CommonCrawler copied to clipboard

Preparing CommonCrawl .wet files via IPFS

Open ChrisCates opened this issue 6 years ago • 0 comments

Summary

CommonCrawler is easily accessible via AWS S3. However, I'm interested in creating some sort of IPFS based distribution of Common Crawl. This way we can self-host and create our own P2P network for seeding and distributing data.

Requirements

  • A website with an index that lists all the wet files. I can style it if you need help.

  • An easy to use JSON REST API that you can cURL data from.

Payment

TBD and is not in consideration in the near term. Will be hosting seed network under %eaxops infrastructure.

ChrisCates avatar May 14 '19 03:05 ChrisCates