CommonCrawler Preparing CommonCrawl .wet files via IPFS

Preparing CommonCrawl .wet files via IPFS

Open ChrisCates opened this issue 6 years ago • 0 comments

Summary

CommonCrawler is easily accessible via AWS S3. However, I'm interested in creating some sort of IPFS based distribution of Common Crawl. This way we can self-host and create our own P2P network for seeding and distributing data.

Requirements

A website with an index that lists all the wet files. I can style it if you need help.
An easy to use JSON REST API that you can cURL data from.

Payment

TBD and is not in consideration in the near term. Will be hosting seed network under %eaxops infrastructure.

May 14 '19 03:05 ChrisCates

CommonCrawler CommonCrawler copied to clipboard

Preparing CommonCrawl .wet files via IPFS

Summary

Requirements

Payment

CommonCrawler
CommonCrawler copied to clipboard