cdx_toolkit icon indicating copy to clipboard operation
cdx_toolkit copied to clipboard

enhancement: select crawl by exact timestamp

Open laurieburchell opened this issue 2 months ago • 0 comments

Querying the index brings back a status, timestamp, url triple, e.g.:

$ cdxt --cc --crawl CC-MAIN-2025-43 iter 'commoncrawl.org/get-started'  

status 200, timestamp 20251014220259, url https://www.commoncrawl.org/get-started
status 200, timestamp 20251016192109, url https://commoncrawl.org/get-started

It would be good to have direct method to bring back a particular record based on the timestamp alone. I'm aware you can do something like cdxt --cc --crawl CC-MAIN-2025-43 --from 20251016192109 --limit 1 warc 'commoncrawl.org/get-started' but a direct --timestamp flag or similar would be useful, given the presentation of the index records.

laurieburchell avatar Oct 30 '25 13:10 laurieburchell