wayback-machine-downloader
wayback-machine-downloader copied to clipboard
Download a website WARCs
There are various tools that enable WARCs to be analyzed, indexed and searched (ex: https://archivesunleashed.org/aut/, https://archivesunleashed.org/warclight/). I am wondering if it is possible to download a website's snapshots as WARCs. If so, would you consider supporting that feature?
@Natkeeran While this would be nice, I don't believe there's a way to directly download the WARCs of the original crawls unless they're made public by the Internet Archive, and currently only some of them are.
@Natkeeran @hook321 It seems that the filesystem tree structure that wayback-machine-downloader generates fits as input for warcit. This should make it possible to get a WARC.
UPDATE: there is a program here that can be used to build a WARC of a website using data from the internet archive.