warcio icon indicating copy to clipboard operation
warcio copied to clipboard

warcio cannot write wet files

Open mraslann opened this issue 3 years ago • 0 comments

I am trying to use warcio to write WET files that hold text-only conversion records, but I am not able to find a way to write a record using warcio without having to make a live web request.

This is a sample of what I am trying to do:

def create_wet_file(url):
    with open(f"BigData/test1.wet.gz", 'wb+') as wet:
        writer = WARCWriter(wet, gzip=True)
        try:
            resp = requests.get(url, headers={'Accept-Encoding': 'identity, gzip', 'Content-Type': 'text/html; charset=utf-8'}, stream=True)
            headers_list = resp.raw.headers.items()
            headers = StatusAndHeaders('200 OK', headers_list, protocol='HTTP/1.0')
            record = writer.create_warc_record(uri=url, record_type='response', payload=resp.raw, http_headers=headers)
            writer.write_record(record)
        except requests.exceptions.ConnectionError as e:
            print(e)

Is this a limitation in warcio, or is there a way around it?

Thank you for any pointers.

mraslann avatar Aug 10 '22 09:08 mraslann