cc-index-table icon indicating copy to clipboard operation
cc-index-table copied to clipboard

Add IP column to Athena table for reverse IP search with `WARC-IP-Address` data

Open cirosantilli opened this issue 1 year ago • 0 comments

Historical hostname -> IP and IP -> hostname (reverse IP) datasets are currently quite hard to come by: https://opendata.stackexchange.com/questions/1951/dataset-of-domain-names the only super convenient methods being websites such as https://viewdns.info/reverseip/ which are expensive and have undocumented methodology.

Would it be possible to add an IP column to Athena that tracks WARC-IP-Address? If we had that, it would be trivial for someone to export that data at relatively low cost from Common Crawl and make it available for all to use on a CSV file hosted on GItHub for example.

Such data can be of great value for OSINT purposes, e.g. I needed it in this project: https://cirosantilli.com/cia-2010-covert-communication-websites

There is a tool made for this apparently: https://github.com/CAIDA/commoncrawl-host-ip-mapper but I don't think it can run quickly/cheaply, the tabular approach would really be ideal here.

cirosantilli avatar Jun 15 '23 07:06 cirosantilli