web-archiving topics

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

webrecorder

archiving

cloud

kubernetes

wacz

web-snap

28

Stars

3

Forks

Watchers

Create "perfect" snapshots of web pages

zytedata

capture-page

javascript

playwright

web-archives

outbackcdx

29

Stars

20

Forks

Watchers

Web archive index server based on RocksDB

nla

wayback

web-archiving

debian-archivebox

17

Stars

5

Forks

Watchers

Home of the official apt/deb package for Ubuntu/Debian-based systems.

ArchiveBox

apt

aptitude

archivebox

debian

httrack2warc

27

Stars

6

Forks

Watchers

Converts HTTrack crawls to WARC files

nla

web-archiving

sandcrawler

24

Stars

2

Forks

Watchers

Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki

internetarchive

web-archiving