PhpSpreadsheet icon indicating copy to clipboard operation
PhpSpreadsheet copied to clipboard

Attribute 'count' missing from /xl/sharedStrings.xml and there should be change of chars only for & and < while writing text to xl/sharedStrings.xml

Open ppodgorskicrido opened this issue 2 years ago • 9 comments

This is:

  • [x] a bug report

I report this bug, because Excel (Office 365) shows that is need to repair xlsx file.

What is the expected behavior?

In xlsx file, file xl/sharedString.xml:

  • node should has attribute "count" like: count="10925"
  • &quot; should be replaced by "
  • &#039; should be replaced by '

What is the current behavior?

In xlsx file, xl/sharedStrings.xml file:

  • node has only 2 attributes: xmlns and uniqueCount
  • &quot; and &#039; is not converted to " and '

What are the steps to reproduce?

Xlsx file was created from database (UTF-8). Text was extracted from HTML. Text in databes contain " and ', but somehow it is converted to &quot; and &#039;.

<?php

require __DIR__ . '/vendor/autoload.php';

$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet));
$writer->save("path.xlsx");

What features do you think are causing the issue

  • [ ] Reader
  • [x] Writer
  • [ ] Styles
  • [ ] Data Validations
  • [ ] Formula Calulations
  • [ ] Charts
  • [ ] AutoFilter
  • [ ] Form Elements

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

It could affect xls file.

Which versions of PhpSpreadsheet and PHP are affected?

PhpSpreadsheet: 1.23 PHP: 8.1.6 Ubuntu: 20.04

ppodgorskicrido avatar Jun 14 '22 08:06 ppodgorskicrido

IMO htmlspecialchars should be removed: XMLWriter

According to XML guidelines & should be escaped by using &amp; (also <), but there is no need to escape " and ' in text.

ppodgorskicrido avatar Jun 14 '22 09:06 ppodgorskicrido

I do not understand the problem you are reporting. When you open the file in Excel, does the string in the cell not have the correct contents? Also, what harm is caused by the missing "count" attribute?

oleibman avatar Jun 15 '22 16:06 oleibman

The problem is that Excel won't open a xlsx file, which contains &quot; or &#039; in xl/sharedStrings.xml.

What happend in my case? My app produce xlsx file (using this library). File won't be opened by Excel, so Excel shows that file can be repaired. After the repair, changes are in xl/sharedStrings.xml as below:

- <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="7372">
+ <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="10925" uniqueCount="7372">

and a lot of changes like:

- <t xml:space="preserve">some text &quot;quotationstext&quot; more text</t>
+ <t xml:space="preserve">some text "quotationstext" more text</t>
- <t xml:space="preserve">some text qout&#039;s more text</t>
+ <t xml:space="preserve">some text qout's more text</t>

I don't know what "count" stands for and I hope you can help with that, but the problem with quotations are IMO pretty obvious.

ppodgorskicrido avatar Jun 15 '22 18:06 ppodgorskicrido

My version of Excel does not have any problem opening a file I created as follows (based largely on the test case in your PR), and the contents of A1 are exactly as expected:

use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$sheet->setTitle('sheet1');
$text = "Text contains: single quote: ' ampersand: & double quote:" . ' " ' . 'less than: < greater than: >';
$sheet->setCellValueByColumnAndRow(1, 1, $text);

$sheet->setCellValue('A1', $text);
$writer = new Xlsx($spreadsheet);
$filename = 'issue.2884.xlsx';
$writer->save($filename);
echo "saved $filename\n";

If the same code results in an unreadable file for you, please upload it so that I can look into how it differs from my result.

oleibman avatar Jun 15 '22 19:06 oleibman

Ok, you're rigth. I check both on Windows and Ubuntu and generated file is correct. I will send a file soon.

ppodgorskicrido avatar Jun 20 '22 10:06 ppodgorskicrido

I striped out all unnecessary rows from the excel. This one row is causing problems

checked-62b5702e20e72.xlsx

ppodgorskicrido avatar Jun 24 '22 08:06 ppodgorskicrido

You have a problem because there is too much data in the cell. There are limits on the maximum number of characters that a cell can contain - see https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3?ui=en-us&rs=en-us&ad=us#ID0EBABAAA=Excel_2016-2013 You have two cells with a huge amount of data, although I think only J2 exceeds the limit. If I go into the xml and delete some lines from that cell, Excel is able to read the file.

oleibman avatar Jun 24 '22 16:06 oleibman

Ok, thank you. I will make new PR tomorrow.

ppodgorskicrido avatar Jun 28 '22 13:06 ppodgorskicrido

See discussion in PR #2913, which is now closed. The problem that is being reported is real, but it appears to be a problem with Excel, not PhpSpreadsheet.

oleibman avatar Sep 02 '22 02:09 oleibman