PhpSpreadsheet
PhpSpreadsheet copied to clipboard
Incorrect date rendering
This is:
- [X] a bug report
- [ ] a feature request
- [X] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)
What is the expected behavior?
If I open "dateSample.xlsx" in my LibreOffice or in Excel, I can read exactly the following (yes, I'm French) :
date formats | standard format |
---|---|
16-mai | 16-mai |
16/05/2022 | 16/05/2022 |
lundi 16 mai 2022 | lundi 16 mai 2022 |
16/05/22 | 16/05/22 |
As you can see, both columns show the same, even if first column is a date format (value 44697) in the sheet1.xml file.
What is the current behavior?
If I open and read the file with PHPSpreadsheet with this code :
<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Reader\Xlsx;
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReaderForFile("dateSample.xlsx");
$reader->setReadDataOnly(false);
$spreadsheet=$reader->load("dateSample.xlsx");
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);
I obtain :
array(5) {
[1]=>
array(2) {
["A"]=>
string(12) "date formats"
["B"]=>
string(15) "standard format"
}
[2]=>
array(2) {
["A"]=>
string(6) "16-May"
["B"]=>
string(6) "16-mai"
}
[3]=>
array(2) {
["A"]=>
string(9) "5/16/2022"
["B"]=>
string(10) "16/05/2022"
}
[4]=>
array(2) {
["A"]=>
string(20) "Monday, May 16, 2022"
["B"]=>
string(17) "lundi 16 mai 2022"
}
[5]=>
array(2) {
["A"]=>
string(8) "16/05/22"
["B"]=>
string(8) "16/05/22"
}
}
Q1) is not a bug for 3A ? why 5/16 (month/day) while 5A 16/05 (day/month) is the expected order ?
Q2) 2A and 4A are not in French, how to get them translated ?
What are the steps to reproduce?
The source code provided should be used for reproducing with the enclosed excel file.
What features do you think are causing the issue
- [X] Reader
- [ ] Writer
- [X] Styles
- [ ] Data Validations
- [ ] Formula Calulations
- [ ] Charts
- [ ] AutoFilter
- [ ] Form Elements
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
The same file open and saved as ODS via libreoffice render each date as "16-May-22" ... which is different (but not really better)
Which versions of PhpSpreadsheet and PHP are affected?
I'm using phpoffice/phpspreadsheet 1.23.0 and PHP 8.1.5 (cli)
When I open your spreadsheet (my setting is English US):
This is, of course, different than your setting because of the different regional settings, but it is entirely consistent with the results you're seeing. Excel (and therefore PhpSpreadsheet) does not save the regional settings with the file.
The region itself is never stored in the Excel file; this is purely a feature of the Excel application itself (it is possible to change region within the application, or to send files to people who have different region settings for their Excel, and values will be rendered according to the locale settings in their Excel)
My current settings are English/UK
Internally, PhpSpreadsheet uses
en_US
.
The point where this can be a problem is when trying to convert a string value like 11/05/2022
... is this the 11th of May 2022? Or is it the 5th of November 2022.
One other potential issue with number format masking for dates is that some of the MS defined formats are locale-aware, others are not:
Those formats shown with a '*' in the type selector are locale-aware, and will display according to locale settings in MS Excel; those displayed without the '*' are "absolute" and should always appear in exactly that format. PhpSpreadsheet doesn't support this difference when rendering dates.
Internally, PhpSpreadsheet uses en_US
1/ is there a way to change this ?
2/ how does PhpSpreadsheet decide to convert to "5/16/2022" for A3 or "16/05/22" for A5 ? (I imagine this is somehow linked with DateFormatter Class : https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Style/NumberFormat/DateFormatter.php)
Internally, PhpSpreadsheet uses en_US
1/ is there a way to change this ?
There is no option for this. The Calculation Engine supports locale settings for formulae, allowing you to use French-language names for Excel functions, and ;
as a function argument separator; and we have recently provided some element of support for locale when reading data from csv (such as handling ,
as a decimal separator rather than .
, and recognising VRAI
as a boolean true), but this is still fairly limited.
I did explore using PHP's Intl extension for handling formatting, but it is still over-complicated and buggy, and prone to issues depending on the version of the underlying ICU version. Enforcing it as a requirement for PhpSpreadsheet when it creates more problems than it might resolve didn't seem like a good trade-off.
2/ how does PhpSpreadsheet decide to convert to "5/16/2022" for A3 or "16/05/22" for A5 ?
That depends on the exact format mask that was stored in the Excel file. The code used for A5 in this file is an explicitly defined code of dd/mm/yy;@
. Cell A3 uses format id 14, one of the MS pre-defined styles (which can be found in https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Style/NumberFormat.php#L260) and maps to m/d/yyyy
(officially this should be mm-dd-yy
according to the ECMA specification, but MS overrides that, and we map to the MS format)... note that this is also one of Microsoft's "locale aware" formats
thanks for this information
The code used for A5 in this file is an explicitly defined code of dd/mm/yy;@. Cell A3 uses format id 14
Could you explain me a little more about the links with the format ? I can read in sheet1.xml :
<c r="A3" s="3"><v>44697</v></c>
<c r="A5" s="7"><v>44697</v></c>
And in style.xml :
<numFmt numFmtId="166" formatCode="dd/mm/yy;@"/>
<xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
<xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
But how is the link make between sheet1 and style files ?
I understand date conversion does not use locale because PHP's DateTime.format() does not use locales ... https://www.php.net/manual/datetime.format.php
The cell entries in sheet1
has an s
attribute that is a style reference:
<row r="3" x14ac:dyDescent="0.25" spans="1:2">
<c r="A3" s="3">
<v>44697</v>
</c>
<c r="B3" t="s" s="2">
<v>1</v>
</c>
</row>
This points to an entry in the cellXfs
element of the styles:
<cellXfs count="8">
<xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0"/>
<xf numFmtId="16" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
<xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" quotePrefix="1"/>
<xf numFmtId="14" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
<xf numFmtId="165" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
<xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" applyAlignment="1">
<alignment horizontal="center"/>
</xf>
<xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" quotePrefix="1" applyAlignment="1">
<alignment horizontal="center"/>
</xf>
<xf numFmtId="166" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
</cellXfs>
So for cell A3
with the s
attribute value of 3
, this points to the 4th entry in that cellXfs
list (offsets counted from 0)
<xf numFmtId="14" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
with applyNumberFormat
set to true
, and a numFmtId
of 14
, which is a built-in format code, as defined in the ECMA specification.
For cell A5
with the s
attribute value of 7
, this points to the 8th entry in that cellXfs
list:
<xf numFmtId="166" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
with applyNumberFormat
set to true
, and a numFmtId
of 166
, which is defined in the numFmts
entry in styles with the explicitly defined Id of 166
:
<numFmt formatCode="dd/mm/yy;@" numFmtId="166"/>
so that formatCode
value is the number format mask used for the value in cell A5
Thanks for that explanation
I have the same issue. The date cames as 12.01.2021 in xlsx ... and after import is recognized as (int) 44186
Until this is fixed, is there a way to import with all fields as strings, and be takes as they are in the excel ?
@AnwarQasem
Until this is fixed, is there a way to import with all fields as strings, and be takes as they are in the excel ?
Dates in MS Excel 101
MS Excel uses a serialized timestamp to represent a date value, so the date/time value is stored in the file as a number like 44186
.
e.g.
<row r="3" x14ac:dyDescent="0.25" spans="1:2">
<c r="A3" s="3">
<v>44697</v>
</c>
</row>
Combined with a number format mask, this can be rendered as a date value like ``21/12/2021`, and my description in a previous response to this thread explains how that serialized timestamp value is linked to a format mask.
By defalt, PhpSpreadsheet loads both the value (number 44186
) and the format mask (e.g. dd/mm/yy;@
) for the cell (unless you tell it to load data only from the file, when it only loads the value).
If you make a call to getValue()
for that cell, PhpSPreadsheet will return the serialized timestamp, the number 44186
; if you call getFormattedValue()
then PhpSpreadsheet will return a date string formatted according to the mask (e.g. 21/12/2021
) that it is linked to.
This is similar to using Unix timestamps in standard PHP, although MS Excel counts in days rather than seconds, and uses the float part of its timestamp to represent the time; and the base date could be either 1/1/1900 or 1/1/1904 (depending on whether the file was originally created in Windows Excel or Mac Excel) rather than the 1/1/1970 used by a unix timestamp.
If you load the file with readDataOnly
set to true
, then PhpSpreadsheet will only load the cell values, but not load style information like the number format masks, and there is no way to identify that 44186
should be formatted as a date, because it can't be distinguished from any other number.
Even then, the methods in the Shared/Date class allow you to convert that number to a unix timestamp, or to a PHP DateTime object that you can then format however you choose; but because you have explicitly told PhpSpreadsheet not to load style information, we have to assume that this is what you want.
So there is nothing to "fix" in PhpSpreadsheet, the data as it is in Excel itself (the number 44186
), the datatype is matched as an Excel datatype number, the format mask is read and associated with the cell (unless you tell PhpSpreadseet not to make this association via readDataOnly
). We provide methods to read both the raw numeric value read from Excel, and the formatted "date" value. We provide methods to convert Excel timestamp values to unix timestamp values or PHP DateTime objects so that you can manipulate or format the values using standard PHP functions if you wish.
And all this is described in our documentation.
I have the same issue. The date cames as 12.01.2021 in xlsx ... and after import is recognized as (int) 44186
As explained, this is not the same issue, because your value is "raw" has you may not have formatted it. To shorten it, the issue is that Excel uses also the locale for formatting values in some cases, while phpSpreadSheet does not.
sheet1
You should be convert Excel date format into normal date. Window Here we should use unix date format to recover original date. excel date substract with unix date format value is "255569 " and then divided with 86400 . Example: $input=44697; $unixDate = ($input- 25569) * 86400; $date=date("d-m-Y", $unixDate); output: like this ="16-05-2022"
@kirankumarmitnas The problem is not with date conversion between Excel timestamps and unix timestamps or PHP DateTime objects: PhpSpreadsheet already has methods to handle these conversions that take into account the different base calendars that can be set for a spreadsheet, and the 1900 leap year problem in MS Excel date handling (and even potentially timezones and daylight savings).
The problem is with the locale-specific date formatting masks
I do not understand how "locale-specific date formatting masks" present a problem. I have read and re-read all of the comments in this thread, and I have looked at Calculate/DateValue.php. Nothing describes the impact of the above term.
Assume the user assigns the string date '1/10/2021' ("Jan 10, 2021" ) in cell A3 in the example above. (If you ask, Excel will tell you the numeric date = 44206). Now, just to be sure there is no mistake about the format, the user formats cell A3 in the example above with the Excel date format equivalent "m/d/yyyy" by using a PHPSS method call like: $Datasheet->getStyle('A3')->getNumberFormat()->setFormatCode(Properties::FORMAT_CODE_DATE); // 'm/d/yyyy'.
There is no question about the date format that the user has selected for cell A3.
Later, when the user queries this cell with a method call like $Date = $Datasheet->getCell('A3')->getFormattedValue(); she will be astounded by the returned numeric value of 44470 with a string equivalent of "10/01/2021', which is Oct 1, 2021. How did that happen?
If the user writes the spreadsheet to a file, Excel will show that the value of cell A3 is 44206, or a date whose equivalent is "Jan 10, 2021". Excel gets it right; PHPSS gets it wrong.
I don't understand how PHPSS's Calculate/DateValue.php can't interrogate the assigned cell format to determine d, m, and yyyy. There is no need to guess if a cell format has been assigned. Blaming "locale-specific date formatting" makes no sense to me. Is there no method 'getFormatCode()'? Or is the format code from "setFormatCode()" stored in write-only-memory?
Which is where I started this comment, I suppose.
How exactly does the user set the value you mentioned? On the spreadsheet? In code?
If it's on the spreadsheet, Excel can treat the string as a date based on your regional settings (a.k.a locale-specific date formatting). On my system, which uses default date format yyyy-mm-dd, it does not recognize '1/10/2021' as a date; it treats it as a string. But, let's assume that my system accepts it. Are you saying that I can save the file, and, when I open the file in Excel, that cell's value is apparently in January, but, when PhpSpreadsheet reads it, it treats the cell as October? If so, please provide the spreadsheet.
If the problem shows up in code, please provide the code. Note that, unlike Excel, PhpSpreadsheet will make no attempt to treat a string that looks like a date as a date.
Attached is a small piece of code with a table in a spreadsheet containing a dozen "potentially confusing" string dates separated by slashes. In the table I use Excel function =DATEVALUE(
In the code, I traverse the table, and acquire the date info using both getCalculatedValue() as well as getFormattedValue() to show the values that PHPSS believes are in the =DATEVALUE() cell (column A). Then I use ordinary php date functions to do the same thing, but I use the string date in column B.
Via echo statements, I display the user's string date, the php numeric date value, the PHPSS numeric date value, and the PHPSS formatted date. Unsurprisingly, they sometimes disagree.
---- begin ECHO statements ---- detected Date format = m/d/yyyy
---------------|---php--|----phpss-------- datestr--------|-dateval-|-dateval= datestr 01/10/2021---|-44206--|-44470=10/1/2021 10/01/2021---|-44470--|-44206=1/10/2021 04/21/2021---|-44307--|-44307=4/21/2021 07/31/2021---|-44408--|-44408=7/31/2021 10/11/2021---|-44480--|-44510=11/10/2021 11/10/2021---|-44510--|-44480=10/11/2021 01/21/2022---|-44582--|-44582=1/21/2022 04/11/2022---|-44662--|-44869=11/4/2022 11/04/2022---|-44869--|-44662=4/11/2022 07/21/2022---|-44763--|-44763=7/21/2022 10/31/2022---|-44865--|-44865=10/31/2022 --------------- end ECHO ----------------------
I write the 2 numeric date values into the table in columns C & D. Those columns are unformatted, so their default format is "General", and they show numeric (date) values.
Note the discrepancy in the Excel table compared to the echo output. This is because Excel always produces the correct date in Column A from the DATEVALUE() function, whereas the echo statements show the [sometimes incorrect] date value produced by PHPSS's emulated DATEVALUE() function.
I believe the d<->m misinterpretation is caused by an uninformed decision in Class Calculation/DateTimeExcel/DateValue.php in private static function setUpArray() (see below) where that function decides whether to reverse $testVal1 and $testVal2. The guess is always correct when either value > 12, and it is always incorrect when d<13. In an average 30 day month, it's correct 12 times, incorrect 18 times, for dates delimited by slashes. I think we can do better than that.
$testVal1 = strtok($dateValue, '- ');
$testVal2 = strtok('- ');
$testVal3 = strtok('- ') ?: $dti->format('Y');
Helpers::adjustYear((string) $testVal1, (string) $testVal2, $testVal3);
$PHPDateArray = Helpers::dateParse($testVal1 . '-' . $testVal2 . '-' . $testVal3);
if (!Helpers::dateParseSucceeded($PHPDateArray)) {
$PHPDateArray = Helpers::dateParse($testVal2 . '-' . $testVal1 . '-' . $testVal3);
}
If the user has defined a cell format for the date source, then that format should be passed to PHPSS's DATEVALUE emulation function. Then there is no guesswork about which parameter is 'd' and which is 'm'. (N.B. In my example, the DATEVALUE() function argument is NOT a value, but rather a cell reference. To find the format for the argument to DATEVALUE(), PHPSS must trace the cell references back to the cell with the ultimate value. I'm glad that's you, not me.)
For user's who don't specify a format, well, there's always Monte Carlo.
You are confusing "correct" with "matches my implementation". Excel does not always return the "correct" result for DATEVALUE. On my system, when I enter =DATEVALUE("01/10/2021")
into an Excel cell, I get #VALUE!
. As mentioned above, this is due to the regional settings on my computer. If I change my regional settings to match yours so that 01/10/2021 is accepted, sobeit, but Excel DATEVALUE will still accept "2021-10-01" even with that setting. I can also change my regional settings so that Excel interprets that date as October rather than January.
Let's look at what Microsoft has to say on the subject https://support.microsoft.com/en-us/office/datevalue-function-df8b07d4-7761-4a93-bc33-b7471bbff252 "The DATEVALUE function converts a date that is stored as text to a serial number that Excel recognizes as a date. For example, the formula =DATEVALUE("1/1/2008") returns 39448, the serial number of the date 1/1/2008. Remember, though, that your computer's system date setting may cause the results of a DATEVALUE function to vary from this example"
PhpSpreadsheet will indeed try to guess what date format is in use when evaluating DATEVALUE; I'm not sure that's a good idea, but it has been doing so for a very long time. The fact that its choice when evaluating an ambiguous date does not match your system's setting does not make it wrong. If your user wants to avoid a Monte Carlo situation, perhaps a better choice is to abandon ambiguous date formats altogether.
There is no ambiguity when a user specifies a cell format "m/d/yyyy", and the cell contains the string "1/10/2021". The computer's system date setting is immaterial. There is only one correct interpretation: m=1, d=10, yyyy=2021. Am I wrong?
Yes, you are wrong. The cell format is not an input to the DATEVALUE function. Only the string is an input, and it can be ambiguous.
Experimenting with format specifications shows you are right. In my mind, then, I ask "Why format a cell?" if it makes no difference to the DATEVALUE function. It returns the same value regardless, and so sometimes it's right (meaning that I agree with its value), but sometimes it returns a value that I disagree with.
Specifically, if I enter the string date 1/10/2021 (intending Jan 10, 2021) in column B and specify its format as m/d/yyyy, then DATEVALUE returns the value 44206, which I believe is correct. If I format the cell 'd/m/yyyy' and change locale to UK (intending Oct 1, 2021) then DATEVALUE still maddeningly returns the value 44206, which I believe is NOT correct, and I can find no combination of formats to change that value.
How can a user have faith that a string date will be interpreted the way he/she intends?
My personal solution will be to eliminate Excel's DATEVALUE function, and instead calculate the DATEVALUE in my code using php date objects, and then insert the calculated value in my Excel table. That sounds like your advice.
(I would also put a stronger disclaimer on PHPSS's DATEVALUE emulation function and warn users that it will guess the wrong return 60% of the time. The current disclaimer is far too ambiguous, which is weirdly ironic.)
And, I never liked Excel's DATEVALUE function anyway!
I do not understand how "locale-specific date formatting masks" present a problem. I have read and re-read all of the comments in this thread, and I have looked at Calculate/DateValue.php. Nothing describes the impact of the above term.
Assume the user assigns the string date '1/10/2021' ("Jan 10, 2021" ) in cell A3 in the example above. (If you ask, Excel will tell you the numeric date = 44206). Now, just to be sure there is no mistake about the format, the user formats cell A3 in the example above with the Excel date format equivalent "m/d/yyyy" by using a PHPSS method call like: $Datasheet->getStyle('A3')->getNumberFormat()->setFormatCode(Properties::FORMAT_CODE_DATE); // 'm/d/yyyy'.
There is no question about the date format that the user has selected for cell A3.
Later, when the user queries this cell with a method call like $Date = $Datasheet->getCell('A3')->getFormattedValue(); she will be astounded by the returned numeric value of 44470 with a string equivalent of "10/01/2021', which is Oct 1, 2021. How did that happen?
If the user writes the spreadsheet to a file, Excel will show that the value of cell A3 is 44206, or a date whose equivalent is "Jan 10, 2021". Excel gets it right; PHPSS gets it wrong.
I don't understand how PHPSS's Calculate/DateValue.php can't interrogate the assigned cell format to determine d, m, and yyyy. There is no need to guess if a cell format has been assigned. Blaming "locale-specific date formatting" makes no sense to me. Is there no method 'getFormatCode()'? Or is the format code from "setFormatCode()" stored in write-only-memory?
Which is where I started this comment, I suppose.
User are using ms office 2019. i have date column on the sheet. user save date like 01/06/2020 or 01-06-2020.after user save excel convert into own format like 43983 and then i am reading that excel file PhpSpreadsheet . output are 43983. it not converting DD/MM/YYYY format. i also use tried above code .but not reflecting on output.
Yes, I think I understand why. I believe it is an ancient date/time formatting convention by Windows to match the computer's understanding of date/time with the user's "predicted" date/time. In Windows settings, you find "Date & Time", and under "Related Settings" you find "Date, time & regional formatting". There you can change your computer's "Country or region" to match your actual location.
For example, if someone from the UK buys a computer from a US company, it might arrive with these settings:
The UK owner can change Settings to this:
August 1, 2022 in the US looks llike 8/1/2022, but in the UK it looks like 1/8/2022. Excel interprets dates using this GLOBAL setting.
However, you have another option in the Excel app. You can specify a "locale" for your spreadsheet.
However, the user CANNOT specify different date formats for different cells. Imagine a spreadsheet where a user wishes to display a combined list of UK and US holidays where the column of UK dates is formatted d/m/yyy to make the dates look familiar to UK viewers, and the column of US dates is formatted m/d/yyyy to look familiar to US users. Doing this will NOT impact how Excel interprets the dates. The Excel app uses the GLOBAL 'locale' setting for all dates in the spreadsheet, and ignores the user's carefully chosen cell/column date format!
It seems logical, sensible and obvious for the app to allow the user's cell/column date format to override the GLOBAL date format, but that is NOT how it works.
My guess is that Microsoft is unlikely to seize this opportunity for improvement in the near future - or the distant future, either, for that matter.
Have Facing same issue But For time. for example i have one excl file that have many rows and columns and in First and Second contain date and time individually when i use this library i got date perfectly but time is look like this hh:mm:ss.44927 . Can you please suggest me how can i resolve this?
FYI https://stackoverflow.com/a/44304796 this might help
it's work .thank you