warcio icon indicating copy to clipboard operation
warcio copied to clipboard

Stream Recompressor

Open white-gecko opened this issue 1 year ago • 2 comments

The Recompressor so far was working on files. With these changes:

  • Tests are added
  • Duplicated code in recompress() and load_and_write() resp. decompress_and_recompress() is deduplicated.
  • The file handling and operations in streams/file-like-objects is separated.
  • _load_and_write_stream() and _decompress_and_recompress_stream() work on streams
  • StreamRecompressor is introduced to work purely on streams:
    • recompress() handles properly compressed or uncompressed streams
    • decompress_recompress() handles any gzip compressed stream

The pull request is not yet organized in nice commits. If it is acceptable in general, I will rebase it and rework the commits, if required. If you prefer nicer commits for review, please tell me.

white-gecko avatar Sep 17 '24 13:09 white-gecko

Disorganization is fine because this is a squash sort of idea, and I can review the squash. I will get to this next week, I'm at a conference.

wumpus avatar Sep 18 '24 04:09 wumpus

To make things more complicated ;-) but also simpler to work with streams I have additionally introduced the RecompressorStream in my wip/recompressorStream branch https://github.com/white-gecko/warcio/pull/1 (For sure, the names can not stay as they are.)

The motivation is, that I do lazy evaluation in my setup and don't know the target stream where I'm going to write to, at the moment, when I setup the recompression.

What do you think about these interfaces?

white-gecko avatar Oct 16 '24 13:10 white-gecko