libarchive icon indicating copy to clipboard operation
libarchive copied to clipboard

libarchive can't handle "stripped" RPM archives

Open marxin opened this issue 3 years ago • 6 comments

I'm using the latest release 3.6.1-1.3 and I noticed the following hello world program:

#include <archive.h>
#include <archive_entry.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  if (argc == 1)
  {
    fprintf (stderr, "Usage: ./a.out FILE\n");
    exit (0);
  }

  struct archive *a;
  struct archive_entry *entry;
  int r;

  a = archive_read_new();
  archive_read_support_filter_all(a);
  archive_read_support_format_all(a);
  r = archive_read_open_filename(a, argv[1], 10240); // Note 1
  if (r != ARCHIVE_OK)
  {
    fprintf (stderr, "ERR: %s\n", archive_error_string(a));
    exit(1);
  }

  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
    printf("%s\n",archive_entry_pathname(entry));
    archive_read_data_skip(a);  // Note 2
  }
  r = archive_read_free(a);  // Note 3
  if (r != ARCHIVE_OK)
    exit(1);
}

Does not work for:

wget https://download.opensuse.org/debug/tumbleweed/repo/oss/x86_64/nodejs-electron-debuginfo-19.0.11-1.1.x86_64.rpm
gcc archive.c -larchive && ./a.out nodejs-electron-debuginfo-19.0.11-1.1.x86_64.rpm
ERR: Unrecognized archive format

Note rpm2cpio also complains about it: nodejs-electron-debuginfo-19.0.11-1.1.x86_64.rpm and I can confirm rpm2archive (from RPM project) can extract it.

marxin avatar Aug 17 '22 19:08 marxin

@fche

marxin avatar Aug 17 '22 20:08 marxin

Why do you think this has to do with files over 4GB?

It looks like this is an RPM package containing a zstd-compressed archive in it. Libarchive seems to not have any problems with the RPM wrapper or the zstd compression, but it does not recognize "07070X" as a valid CPIO signature. A quick google search found a few references to "rpm-style stripped cpio files" -- this appears some new cpio variant invented by the RPM folks that libarchive does not currently support.

kientzle avatar Aug 21 '22 02:08 kientzle

Background: Libarchive's existing RPM support is quite simple: It recognizes and strips off the RPM wrapper and then allows the decompression filters and CPIO format handler to process the contents. This worked well for the original RPM format, where the "body" was a standard self-contained CPIO archive.

In the intervening years, RPM seems to have changed its architecture so that this simple design no longer works. The "body" of this particular RPM contains file contents but none of the association metadata (filenames, types, etc) that are necessary to properly extract it. Instead, that data is stored in the RPM header. To properly handle this in libarchive, we would probably need a complete "RPM format" handler that combines RPM header parsing, compression identification, and breaking out the contents. This is no more complex than other formats that libarchive supports, but it's a lot more complicated than our existing RPM handling.

kientzle avatar Aug 21 '22 03:08 kientzle

In the short term, we could add some code to libarchive's standard CPIO format to identify these "stripped" RPM bodies and emit a more specific error message.

kientzle avatar Aug 21 '22 03:08 kientzle

Thank you for the analysis and the explanation. Note my use case is https://sourceware.org/elfutils/Debuginfod.html daemon which parses RPM files and apparently, we have at least 2 packages in openSUSE Tumbleweed that exceed 4GB in size after decompression.

marxin avatar Aug 22 '22 02:08 marxin

Guess this was invented more than 10 years ago to support >4GB files inside rpm.

https://github.com/rpm-software-management/rpm/commit/68c7cf80d7b763498d0077daa91f649bc209e7ae

tpgxyz avatar Sep 27 '22 17:09 tpgxyz