xferspdy icon indicating copy to clipboard operation
xferspdy copied to clipboard

It doesn't work with large files (~40M in my case)

Open peske opened this issue 3 years ago • 2 comments

Thanks for the effort, but it looks that it doesn't work... At least not optimally. Here's my setup:

I have two binary files (DLLs), which have the exact same size (about 40M), and which differ very slightly. Here's the screenshot from BeyondCompare:

image

As you can see, the files differ only in very few bytes.

But when I've did the following (example from the documentation):

//Create fingerprint of a file
fingerprint := NewFingerprint("/path/foo_v1.binary", 1024)

//Say the file was updated
//Lets generate the diff
diff := NewDiff("/path/foo_v2.binary", *fingerprint)

I've found out that the resulting diff has more than 725,000 blocks (Block). Serialized in JSON the diff is about 9M. I've also tried with a smaller block size (64), and ended up with diff of 150M in JSON.

Sadly I cannot share the actual DLLs (company secret), but I believe that you can reproduce by using any DLL with a similar size, make a copy with few bytes changed here and there, and try.

peske avatar Dec 13 '20 01:12 peske

Btw. wast majority (99.999% in my case) of returned blocks have RawData=nil and HasData=false. Are they really needed? I see that they contain some checksums - maybe to check the input file before patching? If so, isn't be better to ensure the input file integrity in a cheaper way, like include the whole file checksum as output...

peske avatar Dec 13 '20 01:12 peske

I have test cases with different binary files (including large files) so I know that the diff generation and patching does work. But I do understand your point about the patch file size being not optimal. I need to look into more optimal ways of serializing the patch information

monmohan avatar Dec 13 '20 02:12 monmohan