anemone
anemone copied to clipboard
Add support for crawling subdomains
Merge changes to support subdomain crawling from https://github.com/runa/anemone/commit/91559bde052956cfc40ae62678ec2a61574cf928
This feature is very useful. I think anemone should also support for printing out the external links, just print out it but not scan it in deep. The link checker tool XENU (http://home.snafu.de/tilman/xenulink.html) has this feature.
MaGonglei: It is very simple to gather external links using Anemone, and comparably simple to actually check these links to verify they are valid, etc. The 'on_every_page' block is very helpful in this regard.
If you'd like some code that does exactly what you are asking, I could send an example your way.
Hi,wokkaflokka,thanks for your reply. I think I know what you mean,but I prefer to have this feature when I initialize the anemone crawl like : Anemone.crawl("http://www.example.com",:external_links => false) do |anemone| .... end
Because if I use the "on_every_page" block to search the external links (e.g. "page.doc.xpath '//a[@href]') ,it seemed cost too much CPU and Memorys.
If I'm wrong,give me the example.
Thanks.