ddelta icon indicating copy to clipboard operation
ddelta copied to clipboard

Files larger than 2 GB can not be processed by ddelta_generate

Open venoll opened this issue 3 years ago • 7 comments

I am working with Cygwin under Windows 10 Pro x64 20H2.

Generating delta patches with ddelta_generate for files smaller than ~2GB is working fine. But in case you are trying to generate patches for files larger than 2GB it puts out this error: "An error 4 occured: No error"

For some reason some call is interrupted by the system. After reviewing the code I suspect that the used integer types are the reason, specifically the int32_t integers.

So I generated test files with exact 2.147.483.647 bytes (upper limit of 32 bit signed integer) and everything worked fine.

Then I generated test files with exact 2.147.483.648 byte (just one byte more each) and again it states "error 4". So the used integer types are really the issue? Can this be fixed just by using larger integer types?

By the way bsdiff does not have this issue.

venoll avatar Apr 07 '21 20:04 venoll

Requires porting to divsufsort64.h. Should be doable. I think it doubled memory usage, and there was no need for it in terms of Debian packages. Anyway, the project is stale I guess, we've not made any progress on getting consensus on building deltas for the Debian packages, which is what this is intended for :(

Could maintain it despite having no use for it, but oh well, not super interesting. Certainly need to switch building to Meson/CMake and clean up the file format to get it into shape for production usage IMO.

julian-klode avatar Apr 08 '21 11:04 julian-klode

I wonder if I can link to both at the same time and then call divsufsort64() for large files and divsufsort() for smaller ones. You're going to need at least 18 GB of RAM to diff a 2GB file in 64-bit (realistically ~ 20 GB), whereas 32-bit divsufsort tops out at 2GB files requiring only 10GB (12GB) of RAM.

julian-klode avatar Apr 08 '21 11:04 julian-klode

Yes, actually you would need more RAM. But for me that's OK, i bought 64GB yesterday already.

venoll avatar Apr 08 '21 12:04 venoll

Consider porting to libsais instead. There's also a 32 bit version. In my fork of ddelta, there is a configurable option to use libsais instead of libdivsufsort. Results are identical

PascalGuenther avatar Jan 30 '22 21:01 PascalGuenther

Btw, libsais has the advantage of using a diffent API for the 64 bit version, so this would solve the issue you've mentioned above

PascalGuenther avatar Jan 30 '22 21:01 PascalGuenther

Consider porting to libsais instead. There's also a 32 bit version. In my fork of ddelta, there is a configurable option to use libsais instead of libdivsufsort. Results are identical

Libsais is roughly twice as fast as divsufsort (on a typical 8-core server), according to my extensive experimentation. Yet libsais uses slightly more extra memory than divsufsort.

wupengcheng6819 avatar Jul 19 '22 07:07 wupengcheng6819

The 64-bit story isn't that messy in divsufsort -- there's still a separate header with separate types and separate function names. The way they did it is a bit unreadable without cmake's preprocessing, but it does work (well, people aren't complaining when they include both).


It's still not fun to duplicate the whole thing (well, more like ddelta_generate() and search()) for 64-bit stuff. An evil genius might jump out and use some preprocessor magic to spawn two versions of these functions, but intense yawn I am neither going to split files nor going to put a backslash at the end of every line

Why don't we just settle for windowed...

Artoria2e5 avatar Feb 26 '23 14:02 Artoria2e5