commoncrawler icon indicating copy to clipboard operation
commoncrawler copied to clipboard

A SIMPLE (but fast & extensible) crawler using CommonCrawl.

Starter kit :

::

virtualenv env/
source env/bin/activate
pip install -r requirements.txt
python crawler.py

Let your console be flooded by the lists extracted from the web.

We recommend that you redirect the output of the crawler to a file. Then you will be able to see the error output of the crawler, showing some statistics from time to time.