archive-query-log
archive-query-log copied to clipboard
📜 The Archive Query Log.
Updates the requirements on [boto3](https://github.com/boto/boto3) to permit the latest version. Commits 34cbb24 Merge branch 'release-1.38.8' 4756482 Bumping version to 1.38.8 fb57777 Add changelog entries from botocore c09e338 Merge branch 'release-1.38.7'...
Updates the requirements on [marshmallow](https://github.com/marshmallow-code/marshmallow) to permit the latest version. Changelog Sourced from marshmallow's changelog. 4.0.0 (2025-04-16) See :ref:upgrading_4_0 for a guide on updating your code. Features: Typing: Add types...
The parsing regularly fails with an unkown encoding: ``` /workspace/archive_query_log/parsers/xml.py", line 74, in parse_xml_tree text_file = TextIOWrapper(tmp_file, encoding=encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ LookupError: unknown encoding: windows-874 ``` If this is regularly the same...
Download results for captures Potentially: - Nearest capture before + after the capture
In many web archives, the captures are stored in collections that might offer a way to better attribute the source of a capture, e.g., if it was captured manually or...
Currently, for each source (i.e. archive-provider pair) the captures are only fetched once from the CDX API. But as new SERPs get archived, we can regularly extend the captures for...
Some SERP URLs expose the country or even more precise regional information. This could be a way to analyze localized subsets of the AQL.
As language detection generally does not work very well on (short) queries, this could be a way to analyze localized subsets of the AQL.
It could be a way to extract more contextualized/personalized behavior.
e.g., https://govscape.net/search?q=pediatric+healthcare+in+rural+areas&mode=textual