archives Common Crawl

Common Crawl

Open ghost opened this issue 6 years ago • 1 comments

https://commoncrawl.org/

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

I'm not sure how much data it is, but certainly a few TB.

Nov 12 '17 20:11 ghost