hyphe
hyphe copied to clipboard
Websites crawler with built-in exploration and control web interface
example https://www.google.com/url?rct=j&sa=t&url=http://www.out-law.com/en/articles/2016/june/ico-sees-jump-in-number-of-self-reported-data-breaches/&ct=ga&cd=CAIyHDBlNzRhZjQ0NzBhYjBhZDI6Y29tOmVuOkdCOlI&usg=AFQjCNGr_2Yci_y5pbA222T3bmLPb5dNmg&utm_source=twitterfeed&utm_medium=twitter returns 200 but redirect in js in content to http://www.out-law.com/en/articles/2016/june/ico-sees-jump-in-number-of-self-reported-data-breaches/
Need to mark in memory structure elements coming from a specific crawl
ex: - http://www.nosdéputés.fr -> http://www.xn--nosdputss-e4ad.fr/ - http://identità.com -> http://xn--identit-fwa.com/
Example with skyblogs on web archives
are there any access point or button to download all the internal hyperlinks in gexf file in a same time? for instance, i have 100 urls to crawl, so for...