warcio icon indicating copy to clipboard operation
warcio copied to clipboard

Record not followed by newline (conversion error)

Open mw0000 opened this issue 3 years ago • 1 comments

Hi, how to deal with such an error? I'm trying to convert a real old ARCs to use in SolrWayback

mw@webarch:~/solrwayback/indexing/warcs1$ warcio recompress test2.arc.gz test2.warc.gz
    WARNING: Record not followed by newline, perhaps Content-Length is invalid
    Offset: 52006972
    Remainder: b'http://www.omega.poznet.pl:80/rekin.html 212.126.5.228 200101211835 text/html 4274\n'
Recompress Failed: test2.arc.gz could not be read as a WARC or ARC

mw0000 avatar Feb 07 '22 14:02 mw0000

Can you share the ARC file that is causing the error? It may be using a format that was not supported so far..

ikreymer avatar Feb 08 '22 04:02 ikreymer