openwayback icon indicating copy to clipboard operation
openwayback copied to clipboard

Scale: No. of Hits on Host, Aggregate and Clustering Results

Open PsypherPunk opened this issue 11 years ago • 3 comments

Scale: # hits on host, aggregate and clustering results.

S'sheet line: 2 For whom? BNF, BL, DN, IA Notes: New CDX server should enable this. Est. Milestone: 2.x.x

PsypherPunk avatar Dec 16 '13 14:12 PsypherPunk

I believe this concerned ensuring that the performance and user experience were acceptable when a particular page or host had a very large number of instances.

anjackson avatar Feb 12 '14 11:02 anjackson

The question of scaling a large number of hits was raised at BnF when we were doing some study on our big domains. lemonde.fr has over 4 million hits. http://web.archive.org/web//www.lemonde.fr/ works fine http://web.archive.org/web//www.google.com/ doesn't work We currently have a "maxRecords" set to 100 000 and the way Wayback is iterating over each CDX file in the same order as they were configured keeps it from displaying all results. This issue may not be a Wayback-only issue, it goes together with management of large and multiple CDX files.

saraaubry avatar Feb 20 '14 10:02 saraaubry

Should we bake CDX-Server in as a default, and deprecate XML Query?

anjackson avatar Mar 12 '15 16:03 anjackson