sharpcompress
sharpcompress copied to clipboard
Question: How to get XZ uncompressed size
Hello, as far as I know XZ format has index section which contains archive metadata (most notably - uncompressed size).
I've skimmed through XZ implementation in this package and looks like sharpcompress can read XZ index, but it's impossible to get XZBlock information without reading and decompressing whole archive contents.
How can I get XZ index information using this library without extracting archive contents?
It would nice to have to populate uncompressed stream size in Length
property.
If it's in the metadata, then it's something that's just been overlooked for whatever reason. Should be a relatively quick thing to do.
@adamhathcock as far as I understand, uncompressed size can be calculated by reading XZIndex
, but currently there is no known option to read only archive structure without unarchiving Xz contents (as XZStream
returns extracted archive contents).
XZIndex
becomes available only after a whole archive was read:
XzStream.cs
public override int Read(byte[] buffer, int offset, int count)
{
int bytesRead = 0;
if (_endOfStream)
{
return bytesRead;
}
if (!HeaderIsRead)
{
ReadHeader();
}
bytesRead = ReadBlocks(buffer, offset, count);
if (bytesRead < count)
{
_endOfStream = true;
ReadIndex();
ReadFooter();
}
return bytesRead;
}
Similar issue in related lzma project - https://github.com/addaleax/lzma-native/issues/15
Might be useful for implementation.
Zip has the same issue with streamed files where you don't know the size before compression.
We should be able to implement this size on XZ when using Archive strategy but not Reader strategy
@adamhathcock here is a simple snippet to calculate uncompressed size of XZ contents. Hope it helps.
Works only with seekable streams. For non-seakable streams, a whole file should be read before.
public class XzFileInfo
{
private const int XzHeaderSize = 12;
public static ulong GetUncompressedSize(string filePath)
{
using var file = File.Open(filePath, FileMode.Open);
// Read the footer from the end. Footer size is 12 bytes according to the spec.
file.Seek(-XzHeaderSize, SeekOrigin.End);
var footer = XZFooter.FromStream(file);
Debug.WriteLine($"BackwardSize: {footer.BackwardSize}");
// Get xz index offset from BackwardSize and seek to it.
file.Seek(-(XzHeaderSize + footer.BackwardSize), SeekOrigin.End);
var index = XZIndex.FromStream(file, false);
Debug.WriteLine($"Index: number of records - {index.NumberOfRecords}");
// Calculate total uncompressed size of each block.
var size = index.Records.Select(r => r.UncompressedSize).Aggregate((acc, x) => acc + x);
Debug.WriteLine($"Total size of uncompressed archive: {UnitFormatter.FormatByteSize(size)} ({size} bytes)");
return size;
}
}