Mat Kelly
Mat Kelly
The function `getURIsAndDatetimesInCDXJ()` in replay.py iterates through every line in a list of lines to extract the `datetime`, `mime`, and HTTP `status` from a CDXJ file to be used by...
When developing ipwb, I have been using a cycle of `pip install`ing the source then testing. This gets tedious, repetitive, and incurs additional iterative temporal cost. It would be useful...
This implementation is heavily inspired by @ibnesayeed's [MementoMap implementation](https://github.com/oduwsdl/mementomap). In the future we _might_ align this with @ibnesayeed [upcoming API](https://github.com/ibnesayeed/binsearch) but for now, this drastically speeds up replay and thus...
This is a meta-ticket. @ibnesayeed has crafted automation of the release and testing process through GitHub actions, inclusive of generating the release notes. These release notes, among other things, contains...
Related to #165 and an idea I am still hashing (no pun) out. CDXJ output for `ipwb index -e ipwb/samples/warcs/5mementos.warc` then entering the key `goMonarchs` produces: CDXJ output !context ["http://tools.ietf.org/html/rfc7089"]...
The [WARC 1.1 spec](https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/) allows for more precise datetimes. These should be supported in the replay system. Does any tool exist that will generate these yet? If not, some sample...
This occurs in WARCs with WARC Response records containing HTTP responses with a 204 No Content status code. Sample provided in `samples/warcs/HTTP204.warc`. When `pushToIPFS()` in the indexer calls `pushBytesToIPFS()`, which...
Missing images both from the page itself as well as the reconstructive logo. WARC created with local webrecorder--built, run, and recorded using Docker and the webrecorder web interface: [temp-20180822005001.warc.gz](https://github.com/oduwsdl/ipwb/files/2308565/temp-20180822005001.warc.gz) ipwb...
A CDXJ may be specified to the replay system by a URI or IPFS hash per the README. Testing this requires the content first being referred to in the remote...