tufup icon indicating copy to clipboard operation
tufup copied to clipboard

Really really slow (8+ hours) to generate patches for large files (450 mb) with 24GB RAM using bsdiff4.

Open mchaniotakis opened this issue 7 months ago • 4 comments

First off, thanks a lot for this contribution of tufup, it is a great package and the only reliable solution as of now for an auto updating framework, I really appreciate the effort put in this and maintaining it.

Describe the bug I generate 2 versions of my app, exactly the same with the only difference being the version number. Following #69 I use os.environ["PYTHONSEED] = "0" and os.environ["SOURCE_DATE_EPOCH"] = "1577883661" on the file I am running pyinstaller.run() and on the .spec file as well (although its probably not needed in the spec file). Using bsdiff4 to generate patches between the 2 versions:

with gzip.open(file_1, mode='rb') as src_file:
    with gzip.open(file_2, mode='rb') as dst_file:
        bsdiff4.diff(src_bytes=src_file.read() , dst_bytes=dst_file.read())

Looking at my RAM it doesnt seem to become full at any point. This patch generation has been running now for about 8-9 hours.

Using this package: detools I can test the following: image

Provided that I could generate a patch with the detools library, it would be possible to manually do so after a publish, with skip_patch = True and infuse the patch later. However, the patches generated for these bundles are around 350MB to 450MB, which is suspicious and not practical. Here is some code to create patches using detools:

pip install detools

and

from detools.create import create_patch , create_patch_filenames
output_file = "../../../mypatch.patch
with gzip.open(file_1, mode='rb') as src_file:
    with gzip.open(file_2, mode='rb') as dst_file:
        with open(output_file, "wb") as fpatch:
            create_patch(src_file ,dst_file , fpatch, algorithm = "match-blocks" , patch_type = "hdiffpatch" , compression = "none")

To Reproduce Steps to reproduce the behavior: I can provide two copies of the exact same versions that I used from my open sourced app. Feel free to use the code above to test patching with dsdiff4 and detools.

Expected behavior Using bsdiff4 the .diff() never completes (should be very small in size, hopefully less than 45 mb). Using detools the patch generation finishes within 2-10 minutes but the patches are around 350 to 450MB (the application bundle itself is 450 MB)

System info (please complete the following information):

  • OS: Windows
  • OS Version : 11
  • Python version : 3.10.7
  • Pyinstaller version : 6.7.0
  • bsdiff4 version : 1.2.4
  • tufup version : 0.9.0
  • detools version : 0.53.0

Now I understand that this is a problem with possibly the implementation of bsdiff on bsdiff4, however, there is a size limit to files bsdiff can process (at 2 GB) while the hdiffpatch and match-blocks algorithms don't have that limit. I would appreciate any feedback on how should I go about debugging this.

mchaniotakis avatar Jul 05 '24 08:07 mchaniotakis