pywb
pywb copied to clipboard
Possible to index and replay web archive with custom archive directory structure?
I have a web archive with a custom directory structure (recorded in other software). Is it possible to scan this structure automatically for new warc files without moving them to the pywb collection folder? I.e. I want to keep my own archive folder intact and make it possible to index and play back stuff. Looking at the documentation it seems like I have to move all warcs to the collection folder for them to be indexed?
@peterk sincerest apologies for the delay in reply, but to answer your question yes you do have to move the warcs to the collections folder.
collections/
- coll/ -- archive/ (warcs) -- indexes/ (cdxj)
However if you are using docker you can make coll's archive and indexes directories volumes and then mount your external directories to them.
@peterk have you tried symbolic links and changing the directory structure using config.yaml