web-archiving topic

List web-archiving repositories

archivebox-browser-extension

167
Stars
13
Forks
Watchers

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

browsertrix-crawler

620
Stars
79
Forks
Watchers

Run a high-fidelity browser-based web archiving crawler in a single Docker container

ph-submissions

136
Stars
112
Forks
Watchers

The repository and website hosting the peer review process for new Programming Historian lessons

auto-archiver

548
Stars
55
Forks
Watchers

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

browsertrix

172
Stars
32
Forks
Watchers

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

web-snap

28
Stars
3
Forks
Watchers

Create "perfect" snapshots of web pages

outbackcdx

29
Stars
20
Forks
Watchers

Web archive index server based on RocksDB

debian-archivebox

17
Stars
5
Forks
Watchers

Home of the official apt/deb package for Ubuntu/Debian-based systems.

httrack2warc

27
Stars
6
Forks
Watchers

Converts HTTrack crawls to WARC files

sandcrawler

24
Stars
2
Forks
Watchers

Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki