conan icon indicating copy to clipboard operation
conan copied to clipboard

Compression plugin

Open perseoGI opened this issue 7 months ago • 1 comments

Changelog: Feature: Allow externalization of compression/decompression tasks to a user defined plugin Docs: https://github.com/conan-io/docs/pull/4107

Close https://github.com/conan-io/conan/issues/18259 Close https://github.com/conan-io/conan/issues/5209 Close: https://github.com/conan-io/conan/issues/18185 Closes https://github.com/conan-io/conan/issues/6732

Related issues:

  • https://github.com/conan-io/conan/issues/18255

Related PRs:

  • https://github.com/conan-io/conan/pull/18276
  • https://github.com/conan-io/conan/pull/14706

With the aim of simplicity this PR keeps the current behaviour in some compression/decompression tasks such as:

  • Cache save/restore
  • File uploads Adding an extra layer which will be in charge of determining if the compression.py plugin is defined falling back to existing behaviour if it doesn't.

The compression.py plugin interface may follow the following structure:

def tar_extract(archive_path, dest_dir, conf=None, *args, **kwargs) -> None:
    pass
def tar_compress(archive_path, files, recursive, conf=None, ref=None, *args, **kwargs) -> None:
    pass

Pluging example with zstd:

# File: compression.py

import os
import tarfile

import zstandard

from conan.api.output import ConanOutput

# zstd compression
# Original author https://github.com/conan-io/conan/pull/14706
def tar_extract(archive_path, dest_dir, conf=None, *args, **kwargs):
    dctx = zstandard.ZstdDecompressor()
    ConanOutput().info(f"Decompressing {os.path.basename(archive_path)} with compression plugin (ZSTD)")
    with open(archive_path, "rb") as tarfile_obj:
        with dctx.stream_reader(tarfile_obj) as stream_reader:
            # The choice of bufsize=32768 comes from profiling decompression at various
            # values and finding that bufsize value consistently performs well.
            with tarfile.open(
                fileobj=stream_reader, bufsize=32768, mode="r|"
            ) as the_tar:
                the_tar.extractall(
                    path=dest_dir, filter=lambda tarinfo, _: tarinfo
                )


def tar_compress(archive_path, files, recursive, conf=None, ref=None, *args, **kwargs):
    ConanOutput(scope=str(ref or "")).info(
        f"Compressing {os.path.basename(archive_path)} with compression plugin (ZSTD)"
    )
    compresslevel = conf.get("user.zstd:compresslevel", check_type=int) if conf else None
    with open(archive_path, "wb") as tarfile_obj:
        # Only provide level if it was overridden by config.
        zstd_kwargs = {}
        if compresslevel is not None:
            zstd_kwargs["level"] = compresslevel

        dctx = zstandard.ZstdCompressor(write_checksum=True, threads=-1, **zstd_kwargs)

        # Create a zstd stream writer so tarfile writes uncompressed data to
        # the zstd stream writer, which in turn writes compressed data to the
        # output tar.zst file.
        with dctx.stream_writer(tarfile_obj) as stream_writer:
            # The choice of bufsize=32768 comes from profiling compression at various
            # values and finding that bufsize value consistently performs well.
            # The variance in compression times at bufsize<=64KB is small. It is only
            # when bufsize>=128KB that compression times start increasing.
            with tarfile.open(
                mode="w|",
                fileobj=stream_writer,
                bufsize=32768,
                format=tarfile.PAX_FORMAT,
            ) as tar:
                current_frame_bytes = 0
                for filename, abs_path in sorted(files.items()):
                    tar.add(abs_path, filename, recursive=recursive)

                    # Flush the current frame if it has reached a large enough size.
                    # There is no required size, but 128MB is a good starting point
                    # because it allows for faster random access to the file.
                    current_frame_bytes += os.path.getsize(abs_path)
                    if current_frame_bytes >= 134217728:
                        stream_writer.flush(zstandard.FLUSH_FRAME)
                        current_frame_bytes = 0

perseoGI avatar May 16 '25 11:05 perseoGI

@AbrilRBS

the plugin is activated and the file to decompress was not compressed by the plugin, but by vanilla Conan, the workflow would then still be valid

Yes, that is a great point! This can be easily addressed by using a constant filename to pass to the compression plugin for it to create a tar with that name (file extensions apart). Having a constant filename, have two benefits:

  1. Avoiding name conflicts when the compression plugin uses the same extension conan natively uses (.tgz)
  2. Being able to detect while decompressing, even though when we do not have the plugin enabled, if the decompressed tarfile has in its contents our "constant filename" in it (in that case should be passed to the compression plugin) or not, (plain conan tarfile)

packages that have been compressed by the plugin to only be able to be decompressed with the plugin

Of course, if the plugin is not enabled and conan finds out while decompressing that our "constant filename" has been decompressed, it will raise an error as client wouldn't know how to decompress that file.

perseoGI avatar Jun 23 '25 11:06 perseoGI