calamine icon indicating copy to clipboard operation
calamine copied to clipboard

Extracting dates from Excel

Open kper opened this issue 7 years ago • 7 comments
trafficstars

In my excel sheet, I've got a column with dates. Unfortunately, calamine::DateType has no variant for dates and when I handle a date as a string I get for 29.08.2012 -> 41181.

It would be nice to have built-in date conversion

kper avatar Feb 18 '18 10:02 kper

Notes:

Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.

Source: https://support.office.com/en-us/article/datevalue-function-df8b07d4-7761-4a93-bc33-b7471bbff252

kper avatar Feb 18 '18 11:02 kper

Please refer to #95.

The issue is that the logic to decide whether the cell is a Date, an integer or a datetime is not yet implemented and unfortunately not straightforward.

tafia avatar Mar 01 '18 08:03 tafia

I'm just learning Rust, but I got a work-around that might be a starting point?

Basically I get an i64 from any cells where I know ahead of time there will be a date (in other words, I haven't solved the detection logic mentioned above). These i64s are, as @kper's comment clued me to, the number of days since 1900. I then wrote the function below, using the chrono crate's NaiveDate type, to get a workable Date type.

extern crate chrono;
use chrono::NaiveDate; 
// ...
fn from_days_since_1900(days_since_1900: i64) -> NaiveDate {
    let d1900 = NaiveDate::from_ymd(1900, 1, 1);
    d1900 + Duration::days(days_since_1900 - 2)
}

I believe that quirky - 2 at the end is necessary for accuracy, perhaps to compensate for the Excel bug mentioned above? Anyway, hope this is helpful for other people who landed here looking to better handle dates, and/or an eventual PR!

sts10 avatar Aug 18 '18 02:08 sts10

Thanks @sts10.

I believe that quirky - 2 at the end is necessary for accuracy, perhaps to compensate for the Excel bug mentioned above?

On js-xlsx side, they seem to have a different logic

I don't have a lot of time to dig much deeper. I'd be happy to add a util function anyway so people can manually use it now, and hopefully it'll get converted automatically in the future.

tafia avatar Aug 20 '18 01:08 tafia

Oh gosh, that's gonna take me a while to figure out. Luckily, I'm only doing dates (as opposed to datetimes) and they'll all be today's date, so nothing far in the future or past. So I'm hoping my little - 2 does the trick for me. But regardless, thanks for pointing to that part of their code.

sts10 avatar Aug 20 '18 15:08 sts10

I'm solving a similar issue and bumped into this thread. I took some time to understand SheetJS' implementation, here's the commented code, hopefully it'll be helpful to future readers:

// Define `basedate` as Dec 30th, 1899. In current locale timezone.
var basedate = new Date(1899, 11, 30, 0, 0, 0);

// (new Date().getTimezoneOffset() - basedate.getTimezoneOffset())
// The section of code above basically seems unnecessary to me. It'll always result in 0 because both Date objects have the same timezone. Perhaps it's for browser compatibility?
// Which leaves basedate.getTime(): This returns the unix epoch time of Dec 30th, 1899.
// Note that dnthresh is thus equal to Excel's epoch (Jan 1st 1900 minus 2 days).
// This calculation is not entirely accurate, and I *suspect* will give a different answer than Excel if there's a date function referring to some time on or before Feb 28th, 1900. But should be fine any time after.
var dnthresh = basedate.getTime() + (new Date().getTimezoneOffset() - basedate.getTimezoneOffset()) * 60000;

function datenum(v/*:Date*/, date1904/*:?boolean*/)/*:number*/ {
        // Get the unix epoch of the argument.
	var epoch = v.getTime();
       // If using Excel's 1904 date format, shift the argument's date by 1462 days backwards.
      // I think a more intuitive approach is just to shift the Excel epoch itself, add 1462 days to dnthresh.
	if(date1904) epoch -= 1462*24*60*60*1000;

      // Now, get time since Excel epoch by subtracting the Excel epoch time from the arg's date.
     // This is no different from any other epoch date calculations.
	return (epoch - dnthresh) / (24 * 60 * 60 * 1000);
}

function numdate(v/*:number*/)/*:Date*/ {
  // v is days since Jan 1st, 1900
	var out = new Date();
	out.setTime(v * 24 * 60 * 60 * 1000 + dnthresh);
	return out;
}

In short, @sts10 basically has the same implementation as SheetsJS. However, SheetsJS implementation doesn't seem completely correct.

wilfredwee avatar Mar 06 '20 08:03 wilfredwee

Thanks for the analysis! I believe the dates feature now implements a working as_dates fn. This particular issue remains open because the date detection is not automatic (it is up to the caller to convert float to date).

tafia avatar Mar 07 '20 06:03 tafia