gzrt icon indicating copy to clipboard operation
gzrt copied to clipboard

recovering but prepending characters to every line

Open jayenashar opened this issue 1 year ago • 5 comments

at first i thought it was due to the corruption, but it's happening to lines before the corruption as well.

my file is a stream of text lines. when i get a new line of text, i compress it with gzip and append it to the end of the file. this has been working fine for a month or so. my computer froze and i restarted the process before i noticed the issue. if i use gunzip, the beginning 90% of the file is unzipped fine and the last 10% is missing. with gzrecover, it seems to all be there, but there's random characters at the start of every line. usually "XP" but sometimes up to 10 unprintable characters. i can understand it being there after the corruption and i'm happy to go clean it up but i'm not sure why it's happening in the first 90% of the file.

gziprecover 0.8 on debian 11 amd64

jayenashar avatar Oct 04 '22 12:10 jayenashar

Thanks. I don't know what is causing that. The algorithm is ridiculously simple. It just advances byte by byte through the file looking for a gzip header, and if it finds one, attempts to decompress. In theory then, it should have just decompressed normally the first part of the file. I've never heard of something like this before. I appreciate the report. Aaron.

On Tue, Oct 4, 2022 at 8:11 AM Jayen Ashar @.***> wrote:

at first i thought it was due to the corruption, but it's happening to lines before the corruption as well.

my file is a stream of text lines. when i get a new line of text, i compress it with gzip and append it to the end of the file. this has been working fine for a month or so. my computer froze and i restarted the process before i noticed the issue. if i use gunzip, the beginning 90% of the file is unzipped fine and the last 10% is missing. with gzrecover, it seems to all be there, but there's random characters at the start of every line. usually "XP" but sometimes up to 10 unprintable characters. i can understand it being there after the corruption and i'm happy to go clean it up but i'm not sure why it's happening in the first 90% of the file.

gziprecover 0.8 on debian 11 amd64

— Reply to this email directly, view it on GitHub https://github.com/arenn/gzrt/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5M3B2GNTAEHOOVAL6I63WBQNGNANCNFSM6AAAAAAQ4QKH2Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Aaron M. Renn @.***)

arenn avatar Oct 11 '22 08:10 arenn

let me see if i can find the smallest version of the file that creates this behaviour. hopefully something you can investigate.

jayenashar avatar Oct 11 '22 08:10 jayenashar

line1.gz

$ gunzip < line1.gz
{"op":"status","id":20,"statusCode":"SUCCESS","connectionClosed":false}
gzip: stdin: unexpected end of file

$ gzrecover -o >(cat) < line1.gz; echo
XP{"op":"status","id":20,"statusCode":"SUCCESS","connectionClosed":false}

this isn't the smallest example. with the first 24 bytes of the attached line1.gz, you can see XP appears.

jayenashar avatar Oct 11 '22 09:10 jayenashar

Neither gzip nor gzrecover pulled anything out of that file. Do you have a somewhat larger one? Maybe a couple megabytes or something?

On Tue, Oct 11, 2022 at 5:20 AM Jayen Ashar @.***> wrote:

line1.gz https://github.com/arenn/gzrt/files/9753556/line1.gz

$ gunzip < line1.gz {"op":"status","id":20,"statusCode":"SUCCESS","connectionClosed":false} gzip: stdin: unexpected end of file $ gzrecover -o >(cat) < line1.gz; echo XP{"op":"status","id":20,"statusCode":"SUCCESS","connectionClosed":false}

— Reply to this email directly, view it on GitHub https://github.com/arenn/gzrt/issues/8#issuecomment-1274386379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5M3C32GXGPK7ZFHT2VZLWCUWOVANCNFSM6AAAAAAQ4QKH2Q . You are receiving this because you commented.Message ID: @.***>

-- Aaron M. Renn @.***)

arenn avatar Oct 11 '22 13:10 arenn

yeah i have a few hundred megs but both gunzip and gzrecover pull a line from that file on my computer so i don't see why your computer doesn't.

jayenashar avatar Oct 12 '22 01:10 jayenashar