warcat icon indicating copy to clipboard operation
warcat copied to clipboard

http.client.BadStatusLine: http/1.1 200 OK

Open chris-aeviator opened this issue 3 years ago • 0 comments

I'm getting a lot of these errors - some pages work just fine, all the warc files I'm reading have HTML, the error itself is strange enough since 200 ok is not a bad statusline

Error on record <urn:uuid:3b608490-1308-11ec-a263-3905f05120b4>
Traceback (most recent call last):
  File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/tool.py", line 108, in process
    self.action(record)
  File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/tool.py", line 216, in action
    response = util.parse_http_response(data)
  File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/util.py", line 273, in parse_http_response
    response.begin()
  File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/http/client.py", line 319, in begin
    version, status, reason = self._read_status()
  File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/http/client.py", line 301, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: http/1.1 200 OK

my code is

import warcat.tool
tool = warcat.tool.ExtractTool(
        ['/tmp/my.warc'],
        out_dir='/tmp/out/',
        preserve_block=False,
        keep_going=True
        )
tool.process()

chris-aeviator avatar Sep 13 '21 08:09 chris-aeviator