warcio issues

Results 58 warcio issues

Sort by recently updated

warcio cannot write wet files

I am trying to use warcio to write WET files that hold text-only conversion records, but I am not able to find a way to write a record using warcio...

mraslann

Warcio does not support replay of sites hosted on NCSA 1.5

Here is an interesting one for you Ilya. The original NCSA 1.5 web server responds with "HTTP 200 Document follows" rather than HTTP/1.0. In recorderloader.py HTTP_TYPES is only looking for...

omgoo

I'm doing some larger experiments with patching some WARC archives containing wordpress-based websites. Wordpress supports [latex here](https://wordpress.com/support/latex/), unfortunately they offer this through some endpoints that render the latex code to...

wsdookadr

GitHub Action to lint Python code

Test results: https://github.com/cclauss/warcio/actions

cclauss

Incorrect WARC-Payload-Digest values when transfer encoding is present

Per WARC/1.0 spec section 5.9: > The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). The entity-body is the HTTP body *without transfer encoding* per [section 4.3 in...

JustAnotherArchivist

Trying to write to closed file when using `requests.Session`

## Overview When attempting to use `requests.Session` with `capture_http` in some kind of loop to create new WARC files, an error is raised. However, when using `requests` directly without the...

maxyousif15

fix utf-8 encoding

tomeksporczyk

Documentation: Clarify that capture_http writer with filename has no get_stream methood

I'm using Python 3.10.4 and warcio 1.7.4 Using a piece of code based on https://github.com/webrecorder/warcio#writing-warc-records, I'm getting ``` for record in ArchiveIterator(writer.get_stream()): AttributeError: 'WARCWriter' object has no attribute 'get_stream'. Did...

voltagex

warcio check does not raise error when GZip records are truncated

One of the most likely problems we see is failed transfers leading to truncated WARC.GZ files. We can spot this with `gunzip -t` but it would be good if `warcio...

anjackson

Record not followed by newline (conversion error)

Hi, how to deal with such an error? I'm trying to convert a real old ARCs to use in SolrWayback ``` mw@webarch:~/solrwayback/indexing/warcs1$ warcio recompress test2.arc.gz test2.warc.gz WARNING: Record not followed...

mw0000

warcio
warcio copied to clipboard

Metadata

warcio cannot write wet files

Warcio does not support replay of sites hosted on NCSA 1.5

Patching WARCs using warcio

GitHub Action to lint Python code

Incorrect WARC-Payload-Digest values when transfer encoding is present

Trying to write to closed file when using `requests.Session`

fix utf-8 encoding

Documentation: Clarify that capture_http writer with filename has no get_stream methood

warcio check does not raise error when GZip records are truncated

Record not followed by newline (conversion error)

← Metadata

Owner

Metadata

warcio warcio copied to clipboard

Metadata

← Metadata

Owner

Metadata

warcio
warcio copied to clipboard