stream-zip icon indicating copy to clipboard operation
stream-zip copied to clipboard

feat: add strict_timestamps option

Open frederikaalund opened this issue 4 months ago • 7 comments

I ran into an issue today with a device of ours that had a file from 1970 (or at least that was the timestamp). In turn, stream-zip gave a cryptic error. Turns out, the ZIP file format simply does not support timestamps from before 1980. This is a known issue and python's built-in zipfile module provides the strict_timestamps option as a workaround for this.

In short, this PR:

  • Adds strict_timestamps as an optional argument to stream_zip. It defaults to true (replicating the current behaviour; no breaking changes).
  • When you set strict_timestamps=False then the library coerces any modified_at timestamps into [1980; 2108). This silently avoids underflow/overflow.

The CPython zipfile implementation inspired my implementation in this PR.

Let me know what you think!

Side note: Thank you for this great library (along with stream-unzip). :+1: It's crucial for the backup/restore feature on our devices. :)

frederikaalund avatar Apr 12 '24 12:04 frederikaalund

Ah thanks for the PR!

So probably yes, should do something about this... But...

  1. I'm leaning to not having an option at all and just "make it work" so earlier/later timestamps should in all cases be clamped so the zip file is created (Unless we can think of a use case where an error is desirable? Maybe some cases do need some validation for not-too-early-or-too-late timestamps, but I suspect that shouldn't be part of stream-zip)
  2. And I'm leaning to not clamping the extended/unix timestamp if there is one, and only ever clamp the MS DOS one. I'm guessing a bit, but I think it would mean that clients that support the extended timestamps would then still in all cases use the right timestamp.
  3. And leaning to having a bit of documentation on this behaviour somewhere, i.e. for older clients or all clients when extended_timestamps=False, the dates are clamped due to a limitation of the ZIP format

And also - I think would be good to add or change an existing tests around this, checking cases above and below the clamping range for both with and without the extended timestamps somehow.

michalc avatar Apr 12 '24 20:04 michalc

@frederikaalund Let us know if you want to do the above / discuss more? No is all good of course, but in that case I would be tempted for us to make the changes at our end. I'm seeing this PR as essentially a bug report and I'm keen to get it sorted.

michalc avatar Apr 20 '24 06:04 michalc

To just think out loud, I wonder also if the extended/unix timestamps need to be clamped for times outside of the range representable by a signed 32 bit integer, e.g. after 2038...

michalc avatar Apr 20 '24 06:04 michalc

Thanks for the feedback and I apologize for me late reply. It's been a busy week :)

I lean towards (1) too and only decided to not implement it that way since it's technically a breaking change. If you're okay with this subtle change then I'll gladly implement it without the kwarg option. 👍 It is an edge case after all.

If we go down this route, I too think that it's a good idea to clamp the timestamps in general (regardless of whether we store them in the MSDOS header or the Extended timestamp header). Something like this:

  • MSDOS timestamp header: Silently clamp to [1980; 2108)
  • Extended timestamp header: Silently clamp to [1902; 2038)

As for tests and docs, of course, let me add that as well as part of this PR. :) I would ask for the same thing myself so I just shamelessly tried to get away with not doing it 😅

Let me know what you think. Thanks again for your feedback on this PR!

frederikaalund avatar Apr 21 '24 10:04 frederikaalund

If you're okay with this subtle change then I'll gladly implement it without the kwarg option. 👍 It is an edge case after all.

I think I am - it's the better behaviour I would say for my use cases, and it sounds like yours too, and it's simpler to not have an option. Avoiding a breaking change I don't think is worth it really.

Something like this: MSDOS timestamp header: Silently clamp to [1980; 2108) Extended timestamp header: Silently clamp to [1902; 2038)

Yes exactly (and I might lean to the making the range are large as possible i.e. clamping to exactly the largest/smallest second that's representable for each)

michalc avatar Apr 21 '24 11:04 michalc

Hi again, finally found some time to work on this. :)

I just pushed a rework that silently clamps the timestamps to the available precision in the MS-DOS and UNIX headers.

The timestamp limits are:

  • MSDOS timestamp header: Silently clamp to [1980; 2108)
  • Extended timestamp header: Silently clamp to [1970; 2038)

Note that I now use a lower limit of 1970 (and not 1902 as previously discussed). Some tools (unzip in particular) has problems with negative timestamps. It's technically a bug in unzip but it's a rather popular tool so why not just abide.

I also added tests and docs. Let me know if you want anything changed 👍

frederikaalund avatar May 04 '24 10:05 frederikaalund

Looks great! Having a bit of a nose around and think still - will leave comments in the code

michalc avatar May 06 '24 17:05 michalc