PhpSpreadsheet icon indicating copy to clipboard operation
PhpSpreadsheet copied to clipboard

Apply chars limit in single cell in xlsx writer

Open ppodgorskicrido opened this issue 2 years ago • 6 comments

This is:

- [x] a bugfix
- [ ] a new feature
- [ ] refactoring
- [ ] additional unit tests

Checklist:

  • [x] Changes are covered by unit tests
    • [ ] Changes are covered by existing unit tests
    • [x] New unit tests have been added
  • [x] Code style is respected
  • [x] Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • [x] CHANGELOG.md contains a short summary of the change
  • [ ] Documentation is updated as necessary

Why this change is needed?

#2884

ppodgorskicrido avatar Jun 29 '22 08:06 ppodgorskicrido

Aside from Mark's comments, there is a bigger issue here - your test case doesn't actually exercise the code in question. That is because the string is already truncated when the value is set in the cell - the code that does this is in method checkString in Cell/DataType, which is called by method setValueExplicit in Cell/Cell, which is called by both DefaultValueBinder and StringValueBinder. Are you using a custom binder? We would need code that shows how you manged to create this spreadsheet avoiding that check in the first place to figure out the proper solution.

oleibman avatar Jun 29 '22 16:06 oleibman

I'm using pure library (v 1.23.0) without any additions. This project is based on Symfony with Doctrine ORM. Code is simple:

<?php
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
/* Add columns names on top of excel sheet */
$rowArray = [
    // array of strings as columns headers
];
$sheet->fromArray($rowArray);

//then assing values using:
$sheet->setCellValueByColumnAndRow(1, $row, $entity->getDescription()); // function returns string|null

$writer = new XlsxWriter($spreadsheet); // alias of ...\Writer\Xlsx
try {
    $writer->save($this->targetDirectory . '/' . $exportFilePath);
} catch (Exception $e) {
    $this->logger->error('Cannot save excel to path: ' . $exportFilePath);
    $this->logger->error('Error message: ' . $e);
}
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);

ppodgorskicrido avatar Jul 01 '22 11:07 ppodgorskicrido

I am not able to duplicate your result:

require __DIR__ . '/PhpSpreadsheet' . '/vendor/autoload.php';

use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$text = str_repeat('Ω', 35000);
$sheet->setCellValueByColumnAndRow(1, 1, $text);
$sheet->setCellValueByColumnAndRow(1, 2, '=LEN(A1)');
echo 'Length of data in cell A1 is ',
    $sheet->getCell('A2')->getCalculatedValue(),
    "\n";
$writer = new Xlsx($spreadsheet);
$filename = 'issue.2884b.xlsx';
$writer->save($filename);
echo "saved $filename\n";

This should be approximately equivalent to your code. Note that I am using a string composed entirely of multiple byte UTF8 characters, just to make sure this isn't a bytes-vs-characters issue. When I run it, I see what I expected, which is not what you are seeing:

Length of data in cell A1 is 32767
saved issue.2884b.xlsx

And the output file opens with no problem. Can you try running this to see your results.

oleibman avatar Jul 01 '22 12:07 oleibman

I believe I may have found a way to duplicate your problem. I will keep you posted.

oleibman avatar Jul 01 '22 16:07 oleibman

I hate to say it, but this looks like an Excel bug to me. I have read in the file where you originally reported a problem, and confirmed using mb_strlen that the length of the data in J2 is 32,767 characters. It is truncated, with the last characters being "a fryzury to dzieło W". Now, when I open a new spreadsheet and paste that text into a cell, and use LEN to see the length of that cell, it reports 32,751! I think this difference might be because there are 4 ampersand characters in the text, which have been translated from & to &amp;, resulting in an extra 4*4=16 characters. I don't understand why this should be so, but it is consistent with the facts. Then, maybe there is some merit in your original suggestion that &quot; is pushing it over the limit. However, if it is, that's an Excel bug. And, in any case, there's no sensible action we can take. If we arbitrarily truncate at 32,767 after htmlspecialchars, we may wind up cutting off part of &amp; or &quot;, winding up with our xml corrupted in a different way. BTW, I tried adding ENT_NOQUOTES to the htmlspecialchars call in XMLWriter. This did, in fact, leave the quotes unchanged,but it was not sufficient to allow opening the resulting file without error.

oleibman avatar Jul 01 '22 18:07 oleibman

I think I can confirm a bug in Excel. Run the following program:

$longstring1 = str_repeat('a', 32767);
$longstring2 = str_repeat('b', 32767);
echo mb_strlen($longstring1, 'UTF-8'), "\n";
echo mb_strlen($longstring2, 'UTF-8'), "\n";
$filename = 'issue.2884.toolong1d.csv';
$stream = fopen($filename, 'wb');
fputs($stream, "$longstring1\n$longstring2\n");
fclose($stream);
echo "write complete to $filename\n";

You can easily confirm that, as expected, this file shows two 32,767-character lines when viewed in a text editor. Not so when you open it in Excel - cell A1 contains the first 32,759 a's, and cell A2 contains the last 7. One appears to have been lost altogether. Likewise, cell A3 contains the first 32,759 b's, A4 contains the last 7, and one is missing. LibreOffice (which does not have a 32K restriction) shows the data as 32,767 characters in each of A1 and A2. Note that this demo does not depend on PhpSpreadsheet, uses only ASCII characters, and does not use XML replacement for the ampersands.

oleibman avatar Jul 02 '22 00:07 oleibman

Closing this PR. The problem appears to be with Excel, not with PhpSpreadsheet, and the code in this PR won't fix it. I will leave the original issue open so that the problem remains documented.

oleibman avatar Sep 02 '22 02:09 oleibman