warcio
warcio copied to clipboard
warcio cannot write wet files
I am trying to use warcio to write WET files that hold text-only conversion records, but I am not able to find a way to write a record using warcio without having to make a live web request.
This is a sample of what I am trying to do:
def create_wet_file(url):
with open(f"BigData/test1.wet.gz", 'wb+') as wet:
writer = WARCWriter(wet, gzip=True)
try:
resp = requests.get(url, headers={'Accept-Encoding': 'identity, gzip', 'Content-Type': 'text/html; charset=utf-8'}, stream=True)
headers_list = resp.raw.headers.items()
headers = StatusAndHeaders('200 OK', headers_list, protocol='HTTP/1.0')
record = writer.create_warc_record(uri=url, record_type='response', payload=resp.raw, http_headers=headers)
writer.write_record(record)
except requests.exceptions.ConnectionError as e:
print(e)
Is this a limitation in warcio, or is there a way around it?
Thank you for any pointers.