ddelta
ddelta copied to clipboard
Files larger than 2 GB can not be processed by ddelta_generate
I am working with Cygwin under Windows 10 Pro x64 20H2.
Generating delta patches with ddelta_generate for files smaller than ~2GB is working fine. But in case you are trying to generate patches for files larger than 2GB it puts out this error: "An error 4 occured: No error"
For some reason some call is interrupted by the system. After reviewing the code I suspect that the used integer types are the reason, specifically the int32_t integers.
So I generated test files with exact 2.147.483.647 bytes (upper limit of 32 bit signed integer) and everything worked fine.
Then I generated test files with exact 2.147.483.648 byte (just one byte more each) and again it states "error 4". So the used integer types are really the issue? Can this be fixed just by using larger integer types?
By the way bsdiff does not have this issue.
Requires porting to divsufsort64.h. Should be doable. I think it doubled memory usage, and there was no need for it in terms of Debian packages. Anyway, the project is stale I guess, we've not made any progress on getting consensus on building deltas for the Debian packages, which is what this is intended for :(
Could maintain it despite having no use for it, but oh well, not super interesting. Certainly need to switch building to Meson/CMake and clean up the file format to get it into shape for production usage IMO.
I wonder if I can link to both at the same time and then call divsufsort64() for large files and divsufsort() for smaller ones. You're going to need at least 18 GB of RAM to diff a 2GB file in 64-bit (realistically ~ 20 GB), whereas 32-bit divsufsort tops out at 2GB files requiring only 10GB (12GB) of RAM.
Yes, actually you would need more RAM. But for me that's OK, i bought 64GB yesterday already.
Consider porting to libsais instead. There's also a 32 bit version. In my fork of ddelta, there is a configurable option to use libsais instead of libdivsufsort. Results are identical
Btw, libsais has the advantage of using a diffent API for the 64 bit version, so this would solve the issue you've mentioned above
Consider porting to libsais instead. There's also a 32 bit version. In my fork of ddelta, there is a configurable option to use libsais instead of libdivsufsort. Results are identical
Libsais
is roughly twice as fast as divsufsort
(on a typical 8-core server), according to my extensive experimentation. Yet libsais
uses slightly more extra memory than divsufsort
.
The 64-bit story isn't that messy in divsufsort -- there's still a separate header with separate types and separate function names. The way they did it is a bit unreadable without cmake's preprocessing, but it does work (well, people aren't complaining when they include both).
It's still not fun to duplicate the whole thing (well, more like ddelta_generate()
and search()
) for 64-bit stuff. An evil genius might jump out and use some preprocessor magic to spawn two versions of these functions, but intense yawn I am neither going to split files nor going to put a backslash at the end of every line
Why don't we just settle for windowed...