Fixed bug in tm calculation
There was a problem in the calculation of the last days of a month.
Hi @Tekl7 & thank you for your contribution!
Would you be disappointed if I refactored this code to skip all the manual calculations and use the built-in C/C++ functions for date/time conversion? This would make your pull request obsolete. I personally never looked at this module in detail before and I just saw that it could probably be optimized using gmtime and just adding the offset between Excel timestamp epoch and POSIX epoch.
Hi, if I'm not wrong, there is a disadvantage in the approach with POSIX epoch and gmtime function, that the dates before 1.1.1970 used in Excel cells could not be converted to tm structure because the dates could not be expressed in POSIX epoch timestamp, hence the gmtime function could not be used.
Hi @Tekl7 - timestamps (and tm_year) can be negative relative to the epoch:
#include <ctime> // struct tm, gmtime, strftime
#include <iostream>
void printTime( time_t timestamp )
{
constexpr const size_t maxDateSize = 200;
char dateString[ maxDateSize + 1 ];
struct tm *structuredTime = gmtime( ×tamp );
size_t actualSize = strftime(dateString, maxDateSize, "%FT%T", structuredTime);
std::cout << "timestamp " << timestamp << " was converted to " << dateString << " (" << actualSize << " characters)" << std::endl;
}
int main()
{
printTime( time( NULL ) ); // now
printTime( 0 ); // begin of epoch
printTime( -365 * 86400 ); // one year before epoch
printTime( -( 365 * 1971 + 113 ) * 86400L ); // ca. year 0 CE
printTime( -( 365 * 1972 + 113 ) * 86400L ); // ca. year -1 CE
return 0;
}
Output:
timestamp 1725536483 was converted to 2024-09-05T11:41:23 (19 characters)
timestamp 0 was converted to 1970-01-01T00:00:00 (19 characters)
timestamp -31536000 was converted to 1969-01-01T00:00:00 (19 characters)
timestamp -62167219200 was converted to 0-01-01T00:00:00 (16 characters)
timestamp -62198755200 was converted to -1-01-01T00:00:00 (17 characters)
Does this answer your concern?
Researching this, I see one additional issue: Excel appears to have an epoch that wrongly assumes 1900 was a leap year, therefore epoch is off by 1 day until 1 March 1900. This may require a manual patch when converting to gmtime.
Hi, oh, wow, I did not know that the timestamp could be negative. Regarding this, I have no other concerns about using the gmtime approach.
It wasn't always like this - many systems until recently (and some older systems still do, I guess) were storing timestamps in 32 bit, and making those unsigned ints because the signed 32bit POSIX epoch would end in 2039, which isn't really good for storing dates. But luckily, time_t is a signed long (int64_t) on modern compilers.
I started looking at XLDateTime, and realized I was overthinking the leap year logic - maybe a simpler logic is still possible, in which case I'll come back to your pull request. Keeping it open for now.
Hi @Tekl7 - I forgot to get back to you when I patched this three weeks ago: https://github.com/troldal/OpenXLSX/commit/4760629364d424b84c36c4a293ddfef29908210d#diff-3cbe87590d15017521f1f375129275ec8d8b69bf0d03c2a5c6ed494e9715b042 This was actually already my second attempt at fixing #205 - which was proposing the same solution as you did with the key line being
if (days >= serial) break;
Another bug report on this matter was #138.
There were two further issues however:
- if the seconds were calculated from a fraction, and rounded up to 60, that "overflow" was not carried back into the minutes, hours, days - this is now handled at https://github.com/troldal/OpenXLSX/commit/4760629364d424b84c36c4a293ddfef29908210d#diff-3cbe87590d15017521f1f375129275ec8d8b69bf0d03c2a5c6ed494e9715b042R243 and following lines
- The loop abort checks with
days >= serialonly worked for round dates - when the serial had a time of day (fraction), those checks would calculate a wrong date (think 30 days in April, being equal to serial 30 - this aborts the loop and calculates 30 April correctly. But when the serial is 30.5 - noon on 30 April - it would not abort the loop and deduct 30 days, ending up on noon of "0 May". This is fixed by comparingdays + 1 > serial
I will close this pull request as resolved now. Please let me know here (or feel free to open an issue) if you still spot any buggy behavior (I hope I tested enough this time). Thanks for flagging this and sorry for the long response time!