sharpcompress icon indicating copy to clipboard operation
sharpcompress copied to clipboard

Question: How to get XZ uncompressed size

Open x1unix opened this issue 3 years ago • 5 comments

Hello, as far as I know XZ format has index section which contains archive metadata (most notably - uncompressed size).

I've skimmed through XZ implementation in this package and looks like sharpcompress can read XZ index, but it's impossible to get XZBlock information without reading and decompressing whole archive contents.

How can I get XZ index information using this library without extracting archive contents?

It would nice to have to populate uncompressed stream size in Length property.

x1unix avatar Apr 26 '21 03:04 x1unix

If it's in the metadata, then it's something that's just been overlooked for whatever reason. Should be a relatively quick thing to do.

adamhathcock avatar Apr 26 '21 08:04 adamhathcock

@adamhathcock as far as I understand, uncompressed size can be calculated by reading XZIndex, but currently there is no known option to read only archive structure without unarchiving Xz contents (as XZStream returns extracted archive contents).

XZIndex becomes available only after a whole archive was read:

XzStream.cs

       public override int Read(byte[] buffer, int offset, int count)
        {
            int bytesRead = 0;
            if (_endOfStream)
            {
                return bytesRead;
            }

            if (!HeaderIsRead)
            {
                ReadHeader();
            }

            bytesRead = ReadBlocks(buffer, offset, count);
            if (bytesRead < count)
            {
                _endOfStream = true;
                ReadIndex();
                ReadFooter();
            }
            return bytesRead;
        }

x1unix avatar Apr 26 '21 14:04 x1unix

Similar issue in related lzma project - https://github.com/addaleax/lzma-native/issues/15

Might be useful for implementation.

x1unix avatar Apr 26 '21 14:04 x1unix

Zip has the same issue with streamed files where you don't know the size before compression.

We should be able to implement this size on XZ when using Archive strategy but not Reader strategy

adamhathcock avatar Jun 04 '21 12:06 adamhathcock

@adamhathcock here is a simple snippet to calculate uncompressed size of XZ contents. Hope it helps.

Works only with seekable streams. For non-seakable streams, a whole file should be read before.

public class XzFileInfo
    {
        private const int XzHeaderSize = 12;
        public static ulong GetUncompressedSize(string filePath)
        {
            using var file = File.Open(filePath, FileMode.Open);

            // Read the footer from the end. Footer size is 12 bytes according to the spec.
            file.Seek(-XzHeaderSize, SeekOrigin.End);
            var footer = XZFooter.FromStream(file);
            Debug.WriteLine($"BackwardSize: {footer.BackwardSize}");

            // Get xz index offset from BackwardSize and seek to it.
            file.Seek(-(XzHeaderSize + footer.BackwardSize), SeekOrigin.End);
            var index = XZIndex.FromStream(file, false);
            Debug.WriteLine($"Index: number of records - {index.NumberOfRecords}");

            // Calculate total uncompressed size of each block. 
            var size = index.Records.Select(r => r.UncompressedSize).Aggregate((acc, x) => acc + x);
            Debug.WriteLine($"Total size of uncompressed archive: {UnitFormatter.FormatByteSize(size)} ({size} bytes)");
            return size;
        }
    }

x1unix avatar Jul 13 '21 00:07 x1unix