stream-zip
stream-zip copied to clipboard
feat: add strict_timestamps option
I ran into an issue today with a device of ours that had a file from 1970 (or at least that was the timestamp). In turn, stream-zip gave a cryptic error. Turns out, the ZIP file format simply does not support timestamps from before 1980. This is a known issue and python's built-in zipfile module provides the strict_timestamps
option as a workaround for this.
In short, this PR:
- Adds
strict_timestamps
as an optional argument tostream_zip
. It defaults to true (replicating the current behaviour; no breaking changes). - When you set
strict_timestamps=False
then the library coerces anymodified_at
timestamps into [1980; 2108). This silently avoids underflow/overflow.
The CPython zipfile implementation inspired my implementation in this PR.
Let me know what you think!
Side note: Thank you for this great library (along with stream-unzip). :+1: It's crucial for the backup/restore feature on our devices. :)
Ah thanks for the PR!
So probably yes, should do something about this... But...
- I'm leaning to not having an option at all and just "make it work" so earlier/later timestamps should in all cases be clamped so the zip file is created (Unless we can think of a use case where an error is desirable? Maybe some cases do need some validation for not-too-early-or-too-late timestamps, but I suspect that shouldn't be part of stream-zip)
- And I'm leaning to not clamping the extended/unix timestamp if there is one, and only ever clamp the MS DOS one. I'm guessing a bit, but I think it would mean that clients that support the extended timestamps would then still in all cases use the right timestamp.
- And leaning to having a bit of documentation on this behaviour somewhere, i.e. for older clients or all clients when extended_timestamps=False, the dates are clamped due to a limitation of the ZIP format
And also - I think would be good to add or change an existing tests around this, checking cases above and below the clamping range for both with and without the extended timestamps somehow.
@frederikaalund Let us know if you want to do the above / discuss more? No is all good of course, but in that case I would be tempted for us to make the changes at our end. I'm seeing this PR as essentially a bug report and I'm keen to get it sorted.
To just think out loud, I wonder also if the extended/unix timestamps need to be clamped for times outside of the range representable by a signed 32 bit integer, e.g. after 2038...
Thanks for the feedback and I apologize for me late reply. It's been a busy week :)
I lean towards (1) too and only decided to not implement it that way since it's technically a breaking change. If you're okay with this subtle change then I'll gladly implement it without the kwarg option. 👍 It is an edge case after all.
If we go down this route, I too think that it's a good idea to clamp the timestamps in general (regardless of whether we store them in the MSDOS header or the Extended timestamp header). Something like this:
- MSDOS timestamp header: Silently clamp to [1980; 2108)
- Extended timestamp header: Silently clamp to [1902; 2038)
As for tests and docs, of course, let me add that as well as part of this PR. :) I would ask for the same thing myself so I just shamelessly tried to get away with not doing it 😅
Let me know what you think. Thanks again for your feedback on this PR!
If you're okay with this subtle change then I'll gladly implement it without the kwarg option. 👍 It is an edge case after all.
I think I am - it's the better behaviour I would say for my use cases, and it sounds like yours too, and it's simpler to not have an option. Avoiding a breaking change I don't think is worth it really.
Something like this: MSDOS timestamp header: Silently clamp to [1980; 2108) Extended timestamp header: Silently clamp to [1902; 2038)
Yes exactly (and I might lean to the making the range are large as possible i.e. clamping to exactly the largest/smallest second that's representable for each)
Hi again, finally found some time to work on this. :)
I just pushed a rework that silently clamps the timestamps to the available precision in the MS-DOS and UNIX headers.
The timestamp limits are:
- MSDOS timestamp header: Silently clamp to [1980; 2108)
- Extended timestamp header: Silently clamp to [1970; 2038)
Note that I now use a lower limit of 1970 (and not 1902 as previously discussed). Some tools (unzip
in particular) has problems with negative timestamps. It's technically a bug in unzip
but it's a rather popular tool so why not just abide.
I also added tests and docs. Let me know if you want anything changed 👍
Looks great! Having a bit of a nose around and think still - will leave comments in the code