pywb
pywb copied to clipboard
Index WARC files on external storage
Being able to index and re-index collections that are located on remote storage (S3) would be very helpful.
With an additional filename pattern filter setting this would be very useful. If warcs have been collected in some other platform that lacks indexing and playback it would be great to have this option in pywb. E.g. Social Feed Manager (SFM) records related web pages from links in tweets. These are stored in separate warcs where the filename is "WEB-YYYY....warc.gz". Pointing pywb to the root folder to repeatedly scan for added files matching this pattern would be a valuable addition in similar use cases.
I know this is old, but should be able to do so via docker using an S3/Azure driver for the volume mapping. https://github.com/chooban/s3-docker-volume-plugin https://docs.docker.com/registry/storage-drivers/