serenity icon indicating copy to clipboard operation
serenity copied to clipboard

Meta: Remove build time dependency on unzip and tar

Open ADKaster opened this issue 4 years ago • 4 comments

There are two ways I see to do this:

  1. "${CMAKE_COMMAND}" tar https://cmake.org/cmake/help/latest/manual/cmake.1.html#run-a-command-line-tool
  2. file(EXTRACT_ARCHIVE) https://cmake.org/cmake/help/latest/command/file.html?highlight=file#archive-extract

This affects the way we download the following data sets at configure time:

  • UCD (Unicode Character Database)
  • CLDR (Unicode Common Locale Data Repository)
  • PCI ID Database
  • USB ID Database
  • Wasm spec tests

After these datasets are extracted using one of the above methods, we can remove at least unzip from the build requirements. tar is still currently required for the Toolchain builds, but there's a chance we could do those using CMake too :eyes:

ADKaster avatar Sep 06 '21 23:09 ADKaster

CMake ships with libarchive and liblzma, so the tar utility is quite powerful both for compressing and for extracting archives.

friendlyanon avatar Sep 07 '21 00:09 friendlyanon

FWIW, I was trying out ARCHIVE_EXTRACT and it doesn't seem to like gzip files: CMake Error: Problem with archive_read_open_file(): Unrecognized archive format. This was tested on OpenBSD and Arch Linux

cmake_minimum_required(VERSION 3.18)
project(Whatever)
set(PCI_IDS_GZ_URL https://pci-ids.ucw.cz/v2.2/pci.ids.gz)
set(PCI_IDS_GZ_PATH ${CMAKE_BINARY_DIR}/pci.ids.gz)
set(PCI_IDS_PATH ${CMAKE_BINARY_DIR}/pci.ids)
set(PCI_IDS_INSTALL_PATH ${CMAKE_INSTALL_DATAROOTDIR}/pci.ids)

file(DOWNLOAD ${PCI_IDS_GZ_URL} ${PCI_IDS_GZ_PATH} INACTIVITY_TIMEOUT 10)
file(ARCHIVE_EXTRACT INPUT ${PCI_IDS_GZ_PATH})

tuftedocelot avatar Dec 26 '21 13:12 tuftedocelot

This situation is really strange, and is probably an oversight on CMake's part.

file(ARCHIVE_EXTRACT) expects the file to be in an archive format, i.e. one that can store multiple files (e.g. tar, zip or cpio). This means that while you can extract .tar.gz files, you can't extract a plain .gz file.

On the other hand, you can create plain .gz files with file(ARCHIVE_CREATE FORMAT raw COMPRESSION GZip). This totally works:

cmake_minimum_required(VERSION 3.18)
project(Compress)

file(ARCHIVE_CREATE PATHS test.txt FORMAT raw COMPRESSION GZip OUTPUT test.txt.gz)

We should probably file an issue on their GitLab.

BertalanD avatar Dec 26 '21 21:12 BertalanD

I am unsure if closing this issue is correct. We still depend on these external tools on cmake 3.16 (we conditionally test for this).

diegoiast avatar Sep 10 '22 15:09 diegoiast

Maybe it would be nice to use our own tools? After all, we have our own tar implementation, so we could build that (without Unicode support), then use it to unpack the Unicode tarballs, and build the "final" tar executable.

(Sorry for necro-posting)

BenWiederhake avatar Oct 15 '23 08:10 BenWiederhake

can we build tar without unicode support?

diegoiast avatar Oct 22 '23 07:10 diegoiast

We could build our own archiving tools without Unicode support to extract the required files. However, I'm not sure on the technical effort there vs the return. More host tools adds complexity and slows down rebuilds. And also hurts incremental builds. If we use our own tar to extract files, any change to e.g. LibCompress will also trigger a re-extract of Unicode and locale data, causing way more targets to need rebuilt than strictly necessary.

Right now, we've already bumped the ladybird minimum required past 3.16. Building the OS itself requires 3.25 so we can use our upstreamed CMake Platform files. The short putt here is to drop CMake 3.16 and go up to 3.23 or 3.25 globally. Especially since Meta/serenity.sh will build CMake from source if the one in your PATH is too old.

I've also got the experimental gn build going, which defers extraction to a few python files that use the builtin Python extractors.

ADKaster avatar Oct 22 '23 11:10 ADKaster