PHPWord
PHPWord copied to clipboard
Reading table content from docx gives half or previous text
I'm using the following code to read the content of a table in a docx file, but some columns return half text or text that was previously there. Tracking changes is off by default.
$phpWord = IOFactory::createReader('Word2007')->load($file);
$index = 0;
$rows = [];
$sections = $phpWord->getSections();
foreach ($sections[0]->getElements() as $el) {
if ($el instanceof PhpOffice\PhpWord\Element\Table) {
foreach ($el->getRows() as $row) {
$columns = [];
foreach ($row->getCells() as $cell) {
foreach ($cell->getElements() as $cEl) {
if ($cEl instanceof PhpOffice\PhpWord\Element\Text) {
$columns[] = $cEl->getText();
} else if ($cEl instanceof PhpOffice\PhpWord\Element\TextRun) {
if (count($cEl->getElements())>0 and $cEl->getElements()[0] instanceof PhpOffice\PhpWord\Element\Text) {
$columns[] = $cEl->getElements()[0]->getText();
}
} else {
$columns[] = "";
}
}
}
$rows[] = $columns;
$index++;
}
}
}
Is there a better way to read the table or is this an issue with the lib?
I have the same issue. Looking at the docx, the cell contains the text "5.507,63". When I unpack the docx and look at the document.xml, I see that it's actually
<w:r><w:rPr><w:rFonts w:ascii="Arial"/><w:b/><w:sz w:val="20"/></w:rPr><w:t>5.</w:t></w:r><w:r w:rsidR="00114EBE"><w:rPr><w:rFonts w:ascii="Arial"/><w:b/><w:sz w:val="20"/></w:rPr><w:t>507,63</w:t></w:r>
I don't know how this happened in the creation of the docx, but in my case it's obviously not a issue of PHPWord. Maybe Word interpreted the dot as "end of sentence" and put some "dirt" after it and before the 507,63.
@sotirelisc Hi have you got a sample file, please ?