sharpcompress icon indicating copy to clipboard operation
sharpcompress copied to clipboard

Add SOZip (Seek-Optimized ZIP) support

Open Copilot opened this issue 3 months ago • 2 comments

  • [x] Understand the issue: SOZip (Seek-Optimized ZIP) support
  • [x] Explore repository structure and Zip implementation
  • [x] Research SOZip specification requirements
  • [x] Add SOZip core classes:
    • [x] Add SOZip extra data type (0x564B) in ExtraDataType enum
    • [x] Create SOZipExtraField class to parse extra field data
    • [x] Create SOZipIndex class to read/write index files
    • [x] Create SOZipDeflateStream for sync flush compression
    • [x] Add SOZip options to ZipWriterOptions and ZipWriterEntryOptions
  • [x] Add SOZip detection in ZipEntry:
    • [x] Add IsSozip property to detect entries with SOZip extra field
    • [x] Add IsSozipIndexFile property to detect index files by name
  • [x] Add comprehensive unit tests for SOZip functionality:
    • [x] SOZipIndex serialization/deserialization tests
    • [x] ZipReader SOZip detection tests
    • [x] ZipArchive SOZip detection tests
  • [x] Update FORMATS.md documentation
  • [x] All tests passing with ZipReader and ZipArchive
Original prompt

This section details on the original issue you should resolve

<issue_title>Seek Optimized Zip </issue_title> <issue_description>Is there plans to support reading and writing of SO SEEK OPTIMIZED ZIP ARCHIVES?</issue_description>

Comments on the Issue (you are @copilot in this section)

@adamhathcock What is this? A new Zip style/feature?

PRs are welcome.</comment_new> <comment_new>@adamhathcock

Version

  • Version: 0.5.0
  • Date: 2023-Jan-06

License

This specification document is (C) 2022-2023 Even Rouault and licensed under the CC-BY-4.0 terms.

Note: the scope of the copyrighted material does, of course, not extend onto any source or binary code derived from the specification.

What is SOZip ?

A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or several Deflate-compressed files that are organized and annotated such that a SOZip-aware reader can perform very fast random access (seek) within a compressed file.

SOZip makes it possible to access large compressed files directly from a .zip file without prior decompression. It is not a new file format, but a profile of the existing ZIP format, done in a fully backward compatible way. ZIP readers that are non-SOZip aware can read a SOZip-enabled file normally and ignore the extended features that support efficient seek capability.

Use cases

This specification is intended to be general purpose / not domain specific.

SOZip was first developed to serve geospatial use cases, which commonly have large compressed files inside of ZIP archives. In particular, it makes it possible for users to read large Geographic Information Systems (GIS) files using the Shapefile, GeoPackage or FlatGeobuf formats (which have no native provision for compression) compressed in .zip files without prior decompression.

Efficient random access and selective decompression are a requirement to provide acceptable performance in many usage scenarios: spatial index filtering, access to a feature by its identifier, etc.

High-level specification

The SOZip optimization relies on two independent and combined mechanisms:

  • The first mechanism is the generation of a [Deflate](htt...</comment_new>
  • Fixes adamhathcock/sharpcompress#719

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot avatar Nov 26 '25 08:11 Copilot

@copilot make tests pass with reading using the ZIpReader and ZipArchive

adamhathcock avatar Nov 27 '25 08:11 adamhathcock

@copilot make tests pass with reading using the ZIpReader and ZipArchive

Fixed the SOZip tests to work correctly with ZipReader and ZipArchive. The tests now:

  • Verify regular zip files don't have SOZip properties set
  • Test SOZip index file detection by name pattern with both ZipReader and ZipArchive
  • All 15 SOZip tests passing in commit 7339567

Copilot avatar Nov 27 '25 08:11 Copilot