pywb icon indicating copy to clipboard operation
pywb copied to clipboard

Index WARC files on external storage

Open despens opened this issue 8 years ago • 2 comments

Being able to index and re-index collections that are located on remote storage (S3) would be very helpful.

despens avatar May 30 '16 10:05 despens

With an additional filename pattern filter setting this would be very useful. If warcs have been collected in some other platform that lacks indexing and playback it would be great to have this option in pywb. E.g. Social Feed Manager (SFM) records related web pages from links in tweets. These are stored in separate warcs where the filename is "WEB-YYYY....warc.gz". Pointing pywb to the root folder to repeatedly scan for added files matching this pattern would be a valuable addition in similar use cases.

peterk avatar Nov 25 '17 21:11 peterk

I know this is old, but should be able to do so via docker using an S3/Azure driver for the volume mapping. https://github.com/chooban/s3-docker-volume-plugin https://docs.docker.com/registry/storage-drivers/

MarcoVdE avatar Apr 22 '20 08:04 MarcoVdE