libzip
libzip copied to clipboard
`zip_file_set_dostime()` shifts the time around DST changes
Describe the Bug
zip_file_set_dostime() calls _zip_d2u_time() which calls mktime() to convert the time from the local time zone to absolute time_t.
Later _zip_dirent_write() calls _zip_u2d_time() which calls localtime_r() to convert the time back to the local time zone.
Also, there does not seem to be a way to get the time of an entry as dostime (or as any other local time representation, without conversion to absolute time). zip_stat_index() provides only time_t after conversion.
Expected Behavior
The time from zip_file_set_dostime() is used directly. And there should be a way to get the dostime of an entry.
Observed Behavior Due to the double conversion, times around DST changes are shifted.
To Reproduce I looked at the source.
libzip Version 1.9.2
Operating System Linux
Additional context
I am writing a library which would like to be able to set the time exactly. I am migrating code which uses another zip writing library, which has forced a fixed time of entries for reproducible output. The fixed time is specified in the local time zone, which led to digging how to do that with libzip, and discovering that the public API almost allows that. I can as well convert the local time to absolute time myself and use zip_file_set_mtime(), with the same issue.
One way to solve this without changing zip_stat_t would be to allow switching between using the local timezone/DST and UTC. This would affect translation between time_t and DOS timestamps the archive in both directions, and thus would allow the client to apply its own translation to time_t.
Note that the timezone of a zip file does not necessarily correspond to the timezone of the given machine, which means that “impossible” times might occur in nature, even without explicit messing with timestamps.
My library would use the UTC setting, and expose time_t either as absl::Time (absolute time, after translation using the local timezone on my side) or absl::CivilTime (broken down time, using the implicit timezone of the archive, which can preserve archive timestamps). https://abseil.io/docs/cpp/guides/time
I can't follow your argument, as we are consistently using local time zone.
Can you provide us with an example where this does not work?
Let’s say an archive was written in Europe with entries modified at 12.03.2023, 2:30:00.
Then the archive is opened in the US, some entries are modified, and the archive is written back.
I don’t expect that that times from Europe correspond to the same absolute times in the US, because the zip format stores no timezone information. But at least unmodified entries should come back undisturbed.
I suspect that unmodified entries (except for those which get bulk-copied with the beginning of the file) will have their timestamps adjusted, because in the US local time 12.03.2023, 2:30:00 does not exist. _zip_dirent_write() is not able to write the original timestamp.
I tried thinking about it some more, but I still don't understand what you describe in the last comment.
It doesn't really matter at what time the archive is written in Europe, in the file you end up with 4 bytes of data in MS-DOS format, which as no time zone nor DST information.
When you open it in a different time zone, you might get back a different time (depending on time zone offset and DST effects) but when you close the file, you most probably are still in the same time zone and DST setting so the same values should be written back. (There might be an issue if you run the program exactly at the time of a DST time change in spring/fall.)
Do you have an example zip archive and some libzip calls that show this problem?
(Please note that libzip does not parse the extra field 0x5455 that provides better time information.)
When libzip opens the file, it translates timestamps from the DOS format to struct tm, then uses mktime() to translate that to time_t.
When libzip writes the file back, it translates time_t using localtime_r() to struct tm, and then to the DOS format.
This is not reversible. The part between struct tm and time_t and back to struct tm is not reversible for one hour of timestamps each year if DST is observed in the local time.
Here is an example program which demonstrates that:
#include <stdio.h>
#include <string.h>
#include <time.h>
int main(int argc, char** argv) {
struct tm tm;
memset(&tm, 0, sizeof(tm));
tm.tm_isdst = -1;
tm.tm_year = 2023 - 1900;
tm.tm_mon = 3 - 1;
tm.tm_mday = 12;
tm.tm_hour = 2;
tm.tm_min = 30;
tm.tm_sec = 0;
printf("%04d-%02d-%02d %02d:%02d:%02d\n", tm.tm_year + 1900, tm.tm_mon + 1,
tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec);
const time_t t = mktime(&tm);
localtime_r(&t, &tm);
printf("%04d-%02d-%02d %02d:%02d:%02d\n", tm.tm_year + 1900, tm.tm_mon + 1,
tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec);
return 0;
}
When run in TZ=Europe/Zurich, it displays:
2023-03-12 02:30:00
2023-03-12 02:30:00
When run in TZ=America/New_York, it displays:
2023-03-12 02:30:00
2023-03-12 03:30:00
This means that an archive created in Europe, having that timestamp of some entry, and then edited in the US in a way which causes regenerating of archive contents (e.g. compressed differently while preserving timestamps), will have that timestamp shifted.
Using your timestamp I've added a test that documents the current behaviour - the date is reset to the default date (somewhere in 1980) because the time stamp at 2:30 in the morning is invalid in the America/New York timezone.
Ah, my example was misleading. It was mktime() which adjusted its input as a side effect. So confusing API!
Still localtime_r() writes the same struct tm as was adjusted by mktime(). It succeeds for me; I don’t know where in libzip it is treated as invalid.
We've just fixed this, please try git head
Thank you!
I think this is workable, but it would be nicer if the DOS time was returned by ZIP_SOURCE_STAT rather than a separate ZIP_SOURCE_GET_DOS_TIME.
My wrapper for zip_stat_index() fills a class wrapping zip_stat_t. It has getters returning optional absl::Time (which can be losslessly converted from time_t) and absl::CivilSecond (which in the previous version uses the local time zone to convert the time from time_t, and thus undoing libzip’s mktime() — except for corner cases around DST switches which triggered reporting this issue).
The getter for absl::CivilSecond should better directly reach for the modification time of the underlying zip entry, which conceptually corresponds to absl::CivilSecond (regarding lacking the time zone, i.e. modulo the range and resolution). I suppose I can call zip_source_get_dos_time() besides zip_stat_index() to pretend that both formats are a part of zip_stat_t.
There is a similar issue in the other direction: a source can implement ZIP_SOURCE_GET_DOS_TIME in addition to ZIP_SOURCE_STAT, but it would be nice if they were triggered by the same Stat()-like call which fills zip_stat_t, because it would be nice to pretend that both formats of the time are alternative representations of the same concept. For code which imposes a fixed time of entries being created (I have one in a planned migration to my unfinished wrapper), it makes sense to represent the time as absl::CivilSecond and convert that directly to DOS time, because conceptually it does not have a time zone attached.
I understand that changing members of zip_stat_t might be hard for compatibility reasons.
zip_stat_t is part of the API, so changing it would break source level backwards compatibility, something we're not willing to do.
We could introduce a new stat variant, probably using an opaque struct to allow future changes, but I don't think this change is worth it.
If you want to use/expose the DOS time in your wrapper class, adding it as a separate member is probably best.