openwayback
openwayback copied to clipboard
Consolidate indexes. Discard legacy stores and use CDXServer
The CDX server should become the default option and the other search result source's (CDX and BDB) should be discontinued.
The CDX server is already fully functional so this is largely a case of changing the defaults, making sure that it is easy to get up and running with CDX server and updating documentation.
The goal here is to a large extent separation of concern. The CDX server (which maybe should be renamed as the use of CDX files is incidental) should be solely responsible for translating URL+Timestamp searches into results. The OpenWayback webapp should be focused on presenting the result of those searches.
We would like to maintain compatibility with the equivalent separation in pywb.
As part of this we should move entirely to using SURT ordered CDXs. Default for CDX generation etc. Warn when loading non-SURT CDXs.
Make sure that the CDX Server supports ZipNum cluster for compressed CDXs
is this still relevant? I'm asking because I'm starting a new project and I was going to use CDXCollections, is it deprecated? should I start with CDX Server?
I don't know that it has been formally deprecated at this point, but it continues to sound like CDX Server will be required for OpenWayback 3.0, so it would probably be a good idea to start using CDX Server if you are starting something new.
@johnerikhalse is working on CDX Server. At Bibliotheca Alexandrina, we are still using CDXCollection.xml and it is working gracefully.