arkiver

Results 14 issues of arkiver

When an item, in this case https://archive.org/details/files.pushshift.io_201812, is over it's size limit, the following error is returned: ``` requests.exceptions.ConnectionError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')) ``` However, the error returned by...

While FTP download is well supported in Wget, support for FTP archiving is not optimal, especially when it comes to listings and recording communication with the FTP server. Proper FTP...

enhancement

According to the WARCs specifications > The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). this is not currently being done when `Transfer-Encoding` is present.

bug

Currently warcat gives the following error on revisit records from a deduplicated WARC: ```Record failed validation Traceback (most recent call last): File "/usr/local/lib/python3.4/dist-packages/warcat/tool.py", line 282, in action action(record) File "/usr/local/lib/python3.4/dist-packages/warcat/tool.py",...

enhancement

This would be useful for grabs where the exact same images are grabbed with different URLs. There should be a revisit record from an URL to a duplicated URL. Duplicated...

enhancement