wayback-machine-downloader icon indicating copy to clipboard operation
wayback-machine-downloader copied to clipboard

Download a website WARCs

Open Natkeeran opened this issue 6 years ago • 2 comments

There are various tools that enable WARCs to be analyzed, indexed and searched (ex: https://archivesunleashed.org/aut/, https://archivesunleashed.org/warclight/). I am wondering if it is possible to download a website's snapshots as WARCs. If so, would you consider supporting that feature?

Natkeeran avatar Aug 16 '18 02:08 Natkeeran

@Natkeeran While this would be nice, I don't believe there's a way to directly download the WARCs of the original crawls unless they're made public by the Internet Archive, and currently only some of them are.

hook321 avatar Dec 10 '18 11:12 hook321

@Natkeeran @hook321 It seems that the filesystem tree structure that wayback-machine-downloader generates fits as input for warcit. This should make it possible to get a WARC.

UPDATE: there is a program here that can be used to build a WARC of a website using data from the internet archive.

wsdookadr avatar Jun 24 '22 02:06 wsdookadr