cdx_toolkit
cdx_toolkit copied to clipboard
enhancement: select crawl by exact timestamp
Querying the index brings back a status, timestamp, url triple, e.g.:
$ cdxt --cc --crawl CC-MAIN-2025-43 iter 'commoncrawl.org/get-started'
status 200, timestamp 20251014220259, url https://www.commoncrawl.org/get-started
status 200, timestamp 20251016192109, url https://commoncrawl.org/get-started
It would be good to have direct method to bring back a particular record based on the timestamp alone. I'm aware you can do something like cdxt --cc --crawl CC-MAIN-2025-43 --from 20251016192109 --limit 1 warc 'commoncrawl.org/get-started' but a direct --timestamp flag or similar would be useful, given the presentation of the index records.