OpenXLSX icon indicating copy to clipboard operation
OpenXLSX copied to clipboard

Fixed bug in tm calculation

Open Tekl7 opened this issue 1 year ago • 7 comments

There was a problem in the calculation of the last days of a month.

Tekl7 avatar Aug 23 '24 20:08 Tekl7

Hi @Tekl7 & thank you for your contribution!

Would you be disappointed if I refactored this code to skip all the manual calculations and use the built-in C/C++ functions for date/time conversion? This would make your pull request obsolete. I personally never looked at this module in detail before and I just saw that it could probably be optimized using gmtime and just adding the offset between Excel timestamp epoch and POSIX epoch.

aral-matrix avatar Aug 27 '24 09:08 aral-matrix

Hi, if I'm not wrong, there is a disadvantage in the approach with POSIX epoch and gmtime function, that the dates before 1.1.1970 used in Excel cells could not be converted to tm structure because the dates could not be expressed in POSIX epoch timestamp, hence the gmtime function could not be used.

Tekl7 avatar Sep 05 '24 11:09 Tekl7

Hi @Tekl7 - timestamps (and tm_year) can be negative relative to the epoch:

#include <ctime> // struct tm, gmtime, strftime
#include <iostream>

void printTime( time_t timestamp )
{
	constexpr const size_t maxDateSize = 200;
	char dateString[ maxDateSize + 1 ];

	struct tm *structuredTime = gmtime( &timestamp );
	size_t actualSize = strftime(dateString, maxDateSize, "%FT%T", structuredTime);
	std::cout << "timestamp " << timestamp << " was converted to " << dateString << " (" << actualSize << " characters)" << std::endl;
}
int main()
{
	printTime( time( NULL ) ); // now
	printTime( 0 );            // begin of epoch
	printTime( -365 * 86400 );         // one year before epoch
	printTime( -( 365 * 1971 + 113 ) * 86400L );  // ca. year 0 CE
	printTime( -( 365 * 1972 + 113 ) * 86400L );  // ca. year -1 CE
	
	return 0;
}

Output:

timestamp 1725536483 was converted to 2024-09-05T11:41:23 (19 characters)
timestamp 0 was converted to 1970-01-01T00:00:00 (19 characters)
timestamp -31536000 was converted to 1969-01-01T00:00:00 (19 characters)
timestamp -62167219200 was converted to 0-01-01T00:00:00 (16 characters)
timestamp -62198755200 was converted to -1-01-01T00:00:00 (17 characters)

Does this answer your concern?

aral-matrix avatar Sep 05 '24 11:09 aral-matrix

Researching this, I see one additional issue: Excel appears to have an epoch that wrongly assumes 1900 was a leap year, therefore epoch is off by 1 day until 1 March 1900. This may require a manual patch when converting to gmtime.

aral-matrix avatar Sep 05 '24 11:09 aral-matrix

Hi, oh, wow, I did not know that the timestamp could be negative. Regarding this, I have no other concerns about using the gmtime approach.

Tekl7 avatar Sep 06 '24 20:09 Tekl7

It wasn't always like this - many systems until recently (and some older systems still do, I guess) were storing timestamps in 32 bit, and making those unsigned ints because the signed 32bit POSIX epoch would end in 2039, which isn't really good for storing dates. But luckily, time_t is a signed long (int64_t) on modern compilers.

aral-matrix avatar Sep 06 '24 21:09 aral-matrix

I started looking at XLDateTime, and realized I was overthinking the leap year logic - maybe a simpler logic is still possible, in which case I'll come back to your pull request. Keeping it open for now.

aral-matrix avatar Sep 06 '24 22:09 aral-matrix

Hi @Tekl7 - I forgot to get back to you when I patched this three weeks ago: https://github.com/troldal/OpenXLSX/commit/4760629364d424b84c36c4a293ddfef29908210d#diff-3cbe87590d15017521f1f375129275ec8d8b69bf0d03c2a5c6ed494e9715b042 This was actually already my second attempt at fixing #205 - which was proposing the same solution as you did with the key line being

if (days >= serial) break;

Another bug report on this matter was #138.

There were two further issues however:

  1. if the seconds were calculated from a fraction, and rounded up to 60, that "overflow" was not carried back into the minutes, hours, days - this is now handled at https://github.com/troldal/OpenXLSX/commit/4760629364d424b84c36c4a293ddfef29908210d#diff-3cbe87590d15017521f1f375129275ec8d8b69bf0d03c2a5c6ed494e9715b042R243 and following lines
  2. The loop abort checks with days >= serial only worked for round dates - when the serial had a time of day (fraction), those checks would calculate a wrong date (think 30 days in April, being equal to serial 30 - this aborts the loop and calculates 30 April correctly. But when the serial is 30.5 - noon on 30 April - it would not abort the loop and deduct 30 days, ending up on noon of "0 May". This is fixed by comparing days + 1 > serial

I will close this pull request as resolved now. Please let me know here (or feel free to open an issue) if you still spot any buggy behavior (I hope I tested enough this time). Thanks for flagging this and sorry for the long response time!

aral-matrix avatar Feb 03 '25 11:02 aral-matrix