PHPWord
PHPWord copied to clipboard
deleteBlock / replaceBlock doesn't work properly
Hi,
PHPWord is a terrific solution for my needs. Unfortunately I'm not able to use the "deleteBlock" feature since PHPWord doesn't detect the blocks in my files. I raised a ticket in StackOverflow: http://stackoverflow.com/questions/25402045/regexp-in-a-word-xml-why-does-it-not-match Basically I achieved to understand the regexp itself doesn't detect the block in the docx. Any help would be greatly appreciated.
Cheers
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
I have the same issue with this.. Has anyone worked it out?
On my side I just created different templates... no other choice. On StOv they pointed out that RegExp is not the best option to parse XML... I guess it's accurate :)
Thanks for the reply, Only I don't think i'm going to be able to create different templates as the client has specific requirements.
The strange bit about it is if I copy the whole template from the 'Sample_23_TemplateBlock.docx' and paste it at the end of my template the DELETEME block gets removed.
But if I paste it in the middle of my template the DELETEME block isn't removed?..
So I guess I will have to look at other ways to phrase the xml to remove the blocks any help in the right direction would be great thanks.
I have found a fix for the cloneBLock regex/function which seems to fix this problem in my numerous cases.
/**
* Clone a block
*
* @param string $blockname
* @param integer $clones
* @param boolean $replace
* @return string|null
*/
public function cloneBlock($blockname, $clones = 1, $replace = true)
{
$xmlBlock = null;
preg_match(
'/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
$this->documentXML,
$matches
);
if (isset($matches[2])) {
$xmlBlock = $matches[2];
$cloned = array();
for ($i = 1; $i <= $clones; $i++) {
$cloned[] = preg_replace('/\${(.*?)}/','${$1_'.$i.'}', $xmlBlock);
}
if ($replace) {
$this->documentXML = str_replace(
$matches[1] . $matches[2] . $matches[3],
implode('', $cloned),
$this->documentXML
);
}
}
I was still having issues with the regular expression, I just couldn't get it to match my template. Here is my version of the method that uses SimpleXML to find the start and end tags, which for me at least seems to be much more robust. It also incorprates chc88's _1, _2, _3, etc for cloned variables.
/**
* Clone a block
*
* @param string $blockname
* @param integer $clones
* @param boolean $replace
* @return string|null
*/
public function cloneBlock($blockname, $clones = 1, $replace = true)
{
// Parse the XML
$xml = new \SimpleXMLElement($this->documentXML);
// Find the starting and ending tags
$startNode = false; $endNode = false;
foreach ($xml->xpath('//w:t') as $node)
{
if (strpos($node, '${'.$blockname.'}') !== false)
{
$startNode = $node;
continue;
}
if (strpos($node, '${/'.$blockname.'}') !== false)
{
$endNode = $node;
break;
}
}
// Make sure we found the tags
if ($startNode === false || $endNode === false)
{
return null;
}
// Find the parent <w:p> node for the start tag
$node = $startNode; $startNode = null;
while (is_null($startNode))
{
$node = $node->xpath('..')[0];
if ($node->getName() == 'p')
{
$startNode = $node;
}
}
// Find the parent <w:p> node for the end tag
$node = $endNode; $endNode = null;
while (is_null($endNode))
{
$node = $node->xpath('..')[0];
if ($node->getName() == 'p')
{
$endNode = $node;
}
}
/*
* NOTE: Because SimpleXML reduces empty tags to "self-closing" tags.
* We need to replace the original XML with the version of XML as
* SimpleXML sees it. The following example should show the issue
* we are facing.
*
* This is the XML that my document contained orginally.
*
* ```xml
* <w:p>
* <w:pPr>
* <w:pStyle w:val="TextBody"/>
* <w:rPr></w:rPr>
* </w:pPr>
* <w:r>
* <w:rPr></w:rPr>
* <w:t>${CLONEME}</w:t>
* </w:r>
* </w:p>
* ```
*
* This is the XML that SimpleXML returns from asXml().
*
* ```xml
* <w:p>
* <w:pPr>
* <w:pStyle w:val="TextBody"/>
* <w:rPr/>
* </w:pPr>
* <w:r>
* <w:rPr/>
* <w:t>${CLONEME}</w:t>
* </w:r>
* </w:p>
* ```
*/
$this->documentXML = $xml->asXml();
// Find the xml in between the tags
$xmlBlock = null;
preg_match
(
'/'.preg_quote($startNode->asXml(), '/').'(.*?)'.preg_quote($endNode->asXml(), '/').'/is',
$this->documentXML,
$matches
);
if (isset($matches[1]))
{
$xmlBlock = $matches[1];
$cloned = array();
for ($i = 1; $i <= $clones; $i++)
{
$cloned[] = preg_replace('/\${(.*?)}/','${$1_'.$i.'}', $xmlBlock);
}
if ($replace)
{
$this->documentXML = str_replace
(
$matches[0],
implode('', $cloned),
$this->documentXML
);
}
}
return $xmlBlock;
}
Hi Brad,
Thanks for your input, I'll give it a test-run as well. I was just running into some issues with my version as well.
I'll report back!
Regards
@brad-jones
I have tested your function in a series of documents. It seems to be working very well!
Maybe put it in a Pull-Request for the project?
Regards!
I have actually taken it a little further. Checkout: https://github.com/phpgearbox/pdf
OMG - this thing uses regex to parse XML!?! I was searching for why my cloneBlock() suddenly stopped working after adding some formatting inside the block, and came across this issue. XML and REGEX? Makes the hairs on my neck stand up.
And looking at my document source, yes, it is the REGEX that no longer matches. MS Word puts in a few additional elements of formatting, and the the block tags can no longer be found using the REGEX built into cloneBlock(). I'm also going to assume this is never going to be fixed :-(
Edit: Raised my issue separately, with some analysis:
https://github.com/PHPOffice/PHPWord/issues/867
Duplicate of #316
I'm still having issue when cloneBlock and replaceBlock don't find anything (regex tester shows catastrophic backtracking).
brad-jones's solution solves my problems
@klado probably you already solved that problem, but for other users which find this issue in future I found solution, preg_match used here have limit in PHP, and when document is long it can find only last block, not that one on the beginning of the document.
You have to increase that limit in php.ini file, for bigger files event limit suggested in my solution can be too small:
pcre.backtrack_limit = 53001337
pcre.recursion_limit = 53001337
PS deleteBlock not working with LibreOffice documents, output file is corrupted
Hi,
On version 0.16.0 for me i have problems with replace block and delete block (the template gets broken).
For now i just changed the replaceBlock method.
public function replaceBlock($blockname, $replacement)
{
preg_match(
//removed (<\?xml.*) from beginning regex
'/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
$this->tempDocumentMainPart,
$matches
);
if (isset($matches[3])) {
$this->tempDocumentMainPart = str_replace(
$matches[2] . $matches[3] . $matches[4],
$replacement,
$this->tempDocumentMainPart
);
}
//added to remove the start block sign
$this->setValue($blockname, '');
}
deleteBlock not working with LibreOffice documents, output file is corrupted
instead deleteBlock we can use cloneBlock('block_name', 0); It works in my case as delete and do not corrupt output file.
Hi,
On version 0.16.0 for me i have problems with replace block and delete block (the template gets broken).
For now i just changed the replaceBlock method.
public function replaceBlock($blockname, $replacement) { preg_match( //removed (<\?xml.*) from beginning regex '/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is', $this->tempDocumentMainPart, $matches ); if (isset($matches[3])) { $this->tempDocumentMainPart = str_replace( $matches[2] . $matches[3] . $matches[4], $replacement, $this->tempDocumentMainPart ); } //added to remove the start block sign $this->setValue($blockname, ''); }
Your code is wrong. Since you removed (<?xml.*) from beginning regex, you'll get PHP warning "Notice: Undefined offset: 4", because all occurences of $matches[2], $matches[3], $matches[4] should now respectively be $matches[1], $matches[2], $matches[3]
If this helps anyone, I managed to get it to work by doing the following.
public function replaceBlock($blockname, $replacement)
{
$this->tempDocumentMainPart = preg_replace('/(\${' . $blockname . '})(.*)(\${\/' . $blockname . '})/is',$replacement,$this->tempDocumentMainPart);
$this->setValue($blockname, '');
}
It might of been my version of word I was saving it in, but my text was using <w:t> rather than <w:p>, the above seems to catch all eventualities.
public function replaceBlock($blockname, $replacement)
{
// get all content
$data = $this->tempDocumentMainPart;
// searching the block's opening tag
preg_match(
'/(?>(<w:p\s(?:(?!<w:p\s).)*?|<w:p>(?:(?!<w:p>).)*?)\${' . $blockname . '}.*?<\/w:p>)/is',
$data,
$start,
PREG_OFFSET_CAPTURE
);
// block not found
if (empty($start)) {
return $data;
}
$start_offset = $start[0][1];
// document content before block's opening tag
$header = substr($this->tempDocumentMainPart, 0, $start_offset);
// searching the block's closing tag
preg_match(
'/(?>(<w:p\s(?:(?!<w:p\s).)*?|<w:p>(?:(?!<w:p>).)*?)\${' . $blockname . '}.*?<\/w:p>)/is',
$data,
$end,
PREG_OFFSET_CAPTURE,
$start_offset
);
// block not found
if (empty($end)) {
return $data;
}
// document content after block's opening tag
$footer = substr($this->tempDocumentMainPart, $end[0][1] + strlen($end[0][0]));
// combining results with replacement string
$this->tempDocumentMainPart = $header . $replacement . $footer;
}
C4r1sts' solution solved my problem.
I use this simple modification:
public function replaceBlock($blockname, $replacement) {
$this->tempDocumentMainPart = preg_replace(
'/(\${' . $blockname . '})(.*?)(\${\/' . $blockname . '})/is',
$replacement,
$this->tempDocumentMainPart
);
}
liborm85's works for me!
Any progress getting a fix into the codebase?
Thanks liborm85, your regez worked, changed cloneBlock to:
`
public function cloneBlock($blockname, $clones = 1, $replace = true, $indexVariables = false, $variableReplacements = null)
{
$xmlBlock = null;
$matches = array();
preg_match(
'/(\${' . $blockname . '})(.*?)(\${\/' . $blockname . '})/is',
$this->tempDocumentMainPart,
$matches
);
if (isset($matches[3])) {
$xmlBlock = $matches[2];
if ($indexVariables) {
$cloned = $this->indexClonedVariables($clones, $xmlBlock);
} elseif ($variableReplacements !== null && is_array($variableReplacements)) {
$cloned = $this->replaceClonedVariables($variableReplacements, $xmlBlock);
} else {
$cloned = array();
for ($i = 1; $i <= $clones; $i++) {
$cloned[] = $xmlBlock;
}
}
if ($replace) {
var_dump($matches);
$this->tempDocumentMainPart = str_replace(
$matches[1] . $matches[2] . $matches[3],
implode('', $cloned),
$this->tempDocumentMainPart
);
}
}
return $xmlBlock;
}
`
liborm85's works for me too, and as well the weetgeen's solution
deleteBlock not working with LibreOffice documents, output file is corrupted
instead deleteBlock we can use cloneBlock('block_name', 0); It works in my case as delete and do not corrupt output file.
@dva-re solution works for me with current version (as of today it is 0.18.3).
The solution from liborm85 is the best, because it works with inline blocks like:
${outDateBlock}Date ${outDate}${/outDateBlock}
I have downloaded latest version of PHPWord on my system but till date issue is still persist since 2014. Its weird.
The solution from liborm85 is the best, because it works with inline blocks like:
${outDateBlock}Date ${outDate}${/outDateBlock}
Not worked dear for me,
Text in the 'helloWorld.docx' file as:
${block_name}This block content will be replaced${/block_name}
And in the php testing file
<?php
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('helloWorld.docx');
$templateProcessor->replaceBlock('block_name', 'This is the replacement text.');
$templateProcessor->saveAs('helloWorld-new.docx');
any other suggestion? Please share, Thanks.
liborm85
Thanks a lot @liborm85 , you solution made my day, worked like charm and saved my life :) :)
Not worked dear for me,
I use code for delete block, not for replace block
@liborm85 solution doesn't work perfectly while delete block, after block deletion there is an empty line, line should be deleted too
For delete block with empty line, should use: $this->cloneBlock($blockname, 0, true, true);