commoncrawler
commoncrawler copied to clipboard
A SIMPLE (but fast & extensible) crawler using CommonCrawl.
Starter kit :
::
virtualenv env/
source env/bin/activate
pip install -r requirements.txt
python crawler.py
Let your console be flooded by the lists extracted from the web.
We recommend that you redirect the output of the crawler to a file. Then you will be able to see the error output of the crawler, showing some statistics from time to time.