cef icon indicating copy to clipboard operation
cef copied to clipboard

Stop using .tar.bz, maybe??

Open Artoria2e5 opened this issue 1 year ago • 11 comments

Describe the bug The current releases use CEF_ARCHIVE_FORMAT set to tarbz. This is extremely slow to decompress. Bzip2 unpacks slower than xz and does not even compress better.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://cef-builds.spotifycdn.com/index.html
  2. Click on a PDB download
  3. Wait
  4. Decomress
  5. WAIT, DRINK COFFEE

Expected behavior We could really use xz to get at least double the decompression speed. Or even zstd, at the cost of worse compression. These two are extremely widespread.

Screenshots

Versions (please complete the following information):

  • OS: Windows 11, but really does not matter

Additional context Python tarfile has xz support since 3.3. You don't even need to get an external program!

Artoria2e5 avatar May 11 '23 16:05 Artoria2e5

What is the size difference between xz and bz2 when creating archives using Python?

magreenblatt avatar May 11 '23 17:05 magreenblatt

The compression method of the lzma library is identical to xz defaults (preset 6), according to the documentation. Knowing that, I decompressed cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2 into the tar, then recompressed it with xz.

$ ls -l cef*
-rw-r--r-- 1 arthu arthu 825856000 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar
-rw-r--r-- 1 arthu arthu 275852077 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2
-rw-r--r-- 1 arthu arthu 201699604 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.xz

Huh, much smaller. Decompression timing:

$ time bzip2 -d -c cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2 >/dev/null

real    0m30.315s
user    0m15.593s
sys     0m0.187s
$ time xz -d -c cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.xz > /dev/null

real    0m12.301s
user    0m5.265s
sys     0m0.203s

And much faster.

Artoria2e5 avatar May 12 '23 06:05 Artoria2e5

While we are at it, the state of make_distrib.py really isn't good. The else: create_7z_archive(dir, archive_format) branch is dead code. Well, technically the entire 7z function is...

Artoria2e5 avatar May 12 '23 06:05 Artoria2e5

Then 7zip is simply better, which initially was used, but then rejected for some reason. .tar.bz or .tar.xz will always go slower, as it non-true "solid" archive which requires, depending on tools, to uncompress .bz and then extract .tar. I've expect acceptable interopability (as consumer) and this is not tar-variations on windows (i have no issues personally but generally is not ideal). Plain .zip is still winner in this sense. 7zip is right after it. Also 7zip used by chromium build so it should be on board at least for windows (it used for installer).

dmitry-azaraev avatar May 12 '23 06:05 dmitry-azaraev

We also need to consider what comes default-installed on most OSes, and what is supported by common tools like CMake and TeamCity. Also related to issue #2446 (symlink support).

magreenblatt avatar May 12 '23 07:05 magreenblatt

xz comes default installed on most OS. With tar it's always pure solid and has good encoding story (almost always UTF-8). The 2-level decompression is a result of how archive programs are designed on Windows: they are designed around showing file contents, instead of just a full streaming extraction. But since tar has no central directory, it takes a full decompression to show contents anyways. The point is, not tar's fault. See also https://github.com/M2Team/NanaZip/issues/138

7z has a stronger encoding story (mandatory UTF-16), option to be selectively solid, but two issues: pre-installation (partially solved by bsdtar support) and symlink (uh-oh).

zip is a fragmented mess. No solid support, okay pre-installation. Symlink support is possible via Info-ZIP extension but does not seem to be present in Python zipfile.

Artoria2e5 avatar May 12 '23 09:05 Artoria2e5

To clarify, I'm not against .xz, it virtually same thing, so it provides also good compression ratio, which I'm welcomed.

Also, Windows 10 has tar(bsdtar) on board, but it again, virtually useless, as it have only gzip support. And because of this - 7zip is winner, as it anyway third-party tool.

dmitry-azaraev avatar May 12 '23 10:05 dmitry-azaraev

Also, Windows 10 has tar(bsdtar) on board, but it again, virtually useless, as it have only gzip support. And because of this - 7zip is winner, as it anyway third-party tool.

First time hearing this! Interestingly, tar xf a tar.bz2 works, so they have also put in bzip2 support. Since there's no bzip2.exe in my PATH, it's probably compiled in via a library. Which is a bit of a surprise if you think about it, since they could've as easily linked to the public-domain liblzma too.

Ah you know what, let me throw something in the Feedback Hub. No idea if they read it.

Artoria2e5 avatar May 12 '23 14:05 Artoria2e5

@Artoria2e5 mine tar requires bzip2.exe and it doesnt work cause bzip2 absent. Windows 10 also includes curl. Nice, but it compiled without zlib/gzip support, so it cant download compressed deflate stream. And i'm anyway using standalone curl. Agreed what it is kind of strange. :)

dmitry-azaraev avatar May 12 '23 15:05 dmitry-azaraev

Huh, Microsoft is now making the built-in bsdtar the basis of a new feature, it seems. https://www.bleepingcomputer.com/news/microsoft/windows-11-getting-native-support-for-7-zip-rar-and-gz-archives/

I got "working on it" tagged in the Feedback Hub, so they are putting some work in it.

Artoria2e5 avatar May 24 '23 17:05 Artoria2e5

https://github.com/chromiumembedded/cef/issues/3503#issue-1706183376

I support this.

avgarint avatar Aug 03 '24 19:08 avgarint