openwayback
openwayback copied to clipboard
Scale: No. of Hits on Host, Aggregate and Clustering Results
Scale: # hits on host, aggregate and clustering results.
S'sheet line: 2 For whom? BNF, BL, DN, IA Notes: New CDX server should enable this. Est. Milestone: 2.x.x
I believe this concerned ensuring that the performance and user experience were acceptable when a particular page or host had a very large number of instances.
The question of scaling a large number of hits was raised at BnF when we were doing some study on our big domains. lemonde.fr has over 4 million hits. http://web.archive.org/web//www.lemonde.fr/ works fine http://web.archive.org/web//www.google.com/ doesn't work We currently have a "maxRecords" set to 100 000 and the way Wayback is iterating over each CDX file in the same order as they were configured keeps it from displaying all results. This issue may not be a Wayback-only issue, it goes together with management of large and multiple CDX files.
Should we bake CDX-Server in as a default, and deprecate XML Query?