warcat
warcat copied to clipboard
http.client.BadStatusLine: http/1.1 200 OK
I'm getting a lot of these errors - some pages work just fine, all the warc files I'm reading have HTML, the error itself is strange enough since 200 ok
is not a bad statusline
Error on record <urn:uuid:3b608490-1308-11ec-a263-3905f05120b4>
Traceback (most recent call last):
File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/tool.py", line 108, in process
self.action(record)
File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/tool.py", line 216, in action
response = util.parse_http_response(data)
File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/site-packages/warcat/util.py", line 273, in parse_http_response
response.begin()
File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/http/client.py", line 319, in begin
version, status, reason = self._read_status()
File "/home/korny/.conda/envs/ploomber-gpt/lib/python3.9/http/client.py", line 301, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: http/1.1 200 OK
my code is
import warcat.tool
tool = warcat.tool.ExtractTool(
['/tmp/my.warc'],
out_dir='/tmp/out/',
preserve_block=False,
keep_going=True
)
tool.process()