[Feature Request] Combine generated patch files
Apologies if this is already covered by the current functionality, but I didn't see support for this in the usage documentation.
It would be nice if there was a way to derive/combine patches from previously existing patches without them being a naive concatenation.
For example, if I have a binary file with versions A, B, and C, and I have already generated patch files AB (the diff between A and B) and BC (the diff between B and C), I would like to be able to run a command like hdiffz --combine AB BC AC to generate the equivalent output diff to if I had run hdiffz A C AC.
It should be possible to implement this feature, but probably not the best patch file: sizeof(combine(AB,BC)) >= sizeof(hdiffz(A,C))
It makes sense that sizeof(combine(AB,BC)) would be at least == sizeof(hdiffz(A,C)) but didn't expect a significant file size inflation if the equivalent diff could be derived. That's interesting. I don't know the math behind this so you would know best.
I should probably add onto here the use case I am requesting this for in case there's a better means. I was looking into the best approach to a situation where there are many many versions of a large binary file (from 4GB to 9GB across versions) and generating the full matrix of version-to-version patch files would be an expensive task that would continue to get larger with every release and therefore a larger set of patches to generate with every release. I figured if instead of generating the full matrix of version-to-version patches, each version simply had a single patch file generated between itself and the previous version, and if combining patches could be a cheap operation, then the actual patch between the arbitrary versions that's later needed could be generated on demand by combining the existing patches.
I did some simple experiments: (hdiff out uncompressed data, 7zCompress is 7zip with 128M dict)
AB=hdiff(A,B)
BC=hdiff(B,C)
AC=hdiff(A,C)
ABC=hdiff(AB,BC)
Result: 7zCompress(AC) << 7zCompress(ABC) < 7zCompress(AB)+7zCompress(BC)
The effect is not very good, the income( 7zCompress(AB)+7zCompress(BC)-7zCompress(ABC) ) is relatively small.
Also the combine algorithm(not: apply 2 times) is not easy to implement, or I don't know how.
For multiple historical versions create patches to the current latest version; The computational cost is relatively large, but it is still cost-effective compared to the saved patch size.
Now, for large app, some teams use hdiffz -s -c-zstd to save time and machine resources when creating patches; there are also teams configured with 64 core CPU + 512GB memory machines to use hdiffz -m to create the smallest patches.
Thank you for this feedback and information. This is really helpful!
#rejected