bashpider icon indicating copy to clipboard operation
bashpider copied to clipboard

a "crawler" using xargs and wget

bashpider

a spider using nothing but standard unix tools (+ ruby glue)

git clone whatever
cd bashpider
rake data:get_urls # will take a while 
rake crawl:restart # this will run a crawl

If you want to monitor the downloads per second, in another window type the following:

rake crawl:watch

When you feel you've gathered enough data, CTRL-C to kill both windows and then type:

rake results:process 

See: post