Keka icon indicating copy to clipboard operation
Keka copied to clipboard

What is the reason my old Keka 1.0.4 created mostly binary data p7zip solid archives with 50% less in size

Open andyrobik opened this issue 3 years ago • 9 comments

Updated my OSX from Mavericks to Catalina a few days ago and thought it would be nice to update my toolset aswell. Using keka 1.2.12 with 7z (-solid, -excl.ress forks -best compression) results in archives twice the size of my old keka 1.0.4 on mavericks.

I like the task queueing of the recent keka, so i replaced (keka7z/7z.so) with my old keka binaries from 1.0.4 in order to get the results I am used to while keeping queue functionality.

Please give me some insight what could casue this behaviour...

Thank you!

...You should test with copies of almost identical binaries, there must be something wrong with the p7zip SOLID mode

arguments: -t7z -mx9 -ms=on

andyrobik avatar Mar 11 '21 22:03 andyrobik

@andyrobik can you share a file that produces this difference in size when compressed? I can't reproduce your issue.

I've tested compressing Keka.app (which contains multiple binaries) with almost negligible (0,004%) size difference.

aonez avatar Mar 12 '21 09:03 aonez

Just occurred to me that maybe you were using ZIP instead of 7Z format in the newer version?

aonez avatar Mar 12 '21 09:03 aonez

I'm pretty sure the problem isnt keka but p7zip itself - I will share some examples and binaries to test with if youre still interested, as this is really intriguing. Can I PM you here in order so send the test files?

andyrobik avatar Mar 16 '21 02:03 andyrobik

There's no PM here but you can get in touch via mail at [email protected] :)

aonez avatar Mar 16 '21 07:03 aonez

@andyrobik thanks for the files. I'll be testing them and see what can we do about this one.

aonez avatar Mar 17 '21 16:03 aonez

i thank you

andyrobik avatar Mar 17 '21 21:03 andyrobik

Already contacted via mail, but this issue is caused because of the sorting system used by p7zip. You can see in 7-Zip's FAQ:

You can get big difference in compression ratio for different sorting methods, if dictionary size is smaller than total size of files. If there are similar files in different folders, the sorting "by type" can provide better compression ratio in some cases.

Also in this case the use of BCJ2 filter resulted in worse ratio, and using LZMA2 instead of LZMA too, although slightly and with a speed penalty.

Will need to think how to implement this, but meanwhile here are two builds:

  • Keka-QS+LZMA: Using sorting by type and no filter when solid if selected
  • Keka-QS+LZMA: Using sorting by type, no filter and LZMA when solid is selected

aonez avatar Mar 19 '21 17:03 aonez

"Keka-QS+LZMA" is my fave, it takes more than double the time (150%) to compress but utilizes less cpu time and gives better compression results ("Keka-QS" brawled at 75% with howling fans, archive was 20% bigger). It would be nice if there was a way to get the best compression options automagically. you just say e.g. i want the archive to be as small as possible vs i dont care if the result is like a zip ;) as long it is fast - a not too comprehensive set of sliders to tune it. A look into the p7zip docs gives me a headache. With the defaults p7zip behaves, I personally dont see a big advantage over zip/rar right now for binary data compression, so i will stick with my personal "Keka-QS+LZMA" version - thank you!

BTW you made a typo in the above post, naming both version the same.

andyrobik avatar Mar 21 '21 15:03 andyrobik

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 17 '22 05:04 stale[bot]