warcio
warcio copied to clipboard
Record not followed by newline (conversion error)
Hi, how to deal with such an error? I'm trying to convert a real old ARCs to use in SolrWayback
mw@webarch:~/solrwayback/indexing/warcs1$ warcio recompress test2.arc.gz test2.warc.gz
WARNING: Record not followed by newline, perhaps Content-Length is invalid
Offset: 52006972
Remainder: b'http://www.omega.poznet.pl:80/rekin.html 212.126.5.228 200101211835 text/html 4274\n'
Recompress Failed: test2.arc.gz could not be read as a WARC or ARC
Can you share the ARC file that is causing the error? It may be using a format that was not supported so far..