PHPWord icon indicating copy to clipboard operation
PHPWord copied to clipboard

Fix TemplateProcessor::fixBrokenMacros to handle whitespaces

Open chapa opened this issue 5 years ago • 9 comments

Description

This is a proposal to fix #590 in a cleaner way than a global regex on the whole document, which might change parts of the document that must not change.

Edit: see https://github.com/PHPOffice/PHPWord/pull/1971#issuecomment-970481810

TemplateProcessor::fixBrokenMacros() is a method that strip all tags between ${, the name of the variable and } (let's call them the 3 parts). If there is a <w:t xml:space="preserve"> in those tags it means that there is a trailing space at the end of the content of the tag, because there is not supposed to be any space between the 3 parts, so the whole must be inside of it in order to keep this trailing space.

Examples:

<w:r><w:t>before ${</w:t></w:r><w:r><w:t xml:space="preserve">variable} </w:t></w:r><w:r><w:t>after</w:t></w:r>
                                                                       |
                                                              the trailing space

<w:r><w:t>before ${</w:t></w:r><w:r><w:t>variable</w:t></w:r><w:r><w:t xml:space="preserve">} </w:t></w:r><w:r><w:t>after</w:t></w:r>
                                                                                             |
                                                                                    the trailing space

I think it's safe to move the opening tag just before the variable, but one thing I'm not sure about because I don't know Open XML very well, is whether it's safe to prepend the variable by </w:t></w:r><w:r><w:t xml:space="preserve">. I think it's ok if a <w:t> is necessarily in a <w:r> and there cannot be multiple <w:t> in a <w:r> (otherwise we could just remove the </w:r><w:r> part).

Expected result of the above examples:

<w:r><w:t>before </w:t></w:r><w:r><w:t xml:space="preserve">${variable} </w:t></w:r><w:r><w:t>after</w:t></w:r>

Checklist:

  • [x] I have run composer run-script check --timeout=0 and no errors were reported
  • [x] The new code is covered by unit tests (check build/coverage for coverage report)
  • [ ] I have updated the documentation to describe the changes => it's a bug fix

chapa avatar Nov 20 '20 21:11 chapa

Coverage Status

coverage: 95.538% (+0.006%) from 95.532% when pulling 838b63ed3c4685c21ca4e94d90e9e2b87f7d9124 on chapa:develop into e76b701ef538cb749641514fcbc31a68078550fa on PHPOffice:master.

coveralls avatar Feb 18 '21 13:02 coveralls

Thank you for the fix! Could it be merged?

4rthem avatar Nov 15 '21 21:11 4rthem

Actually I found a case where my fix isn't working: when there is styling tags between the $ and the } of a single variable. Because all tags are stripped we lose styling information.

After several tests, I came to the conclusion that the best way to fix those broken macros without breaking anything else was to move all existing tags inside the variable right after it, for example:

<w:r><w:t>${</w:t></w:r><w:r><w:t>variable1</w:t></w:r><w:r><w:t>}</w:t></w:r>
<!--        └─────────1──────────┘         └─────────2──────────┘          -->

should be transformed to:

<w:r><w:t>${variable1}</w:t></w:r><w:r><w:t></w:t></w:r><w:r><w:t></w:t></w:r>
<!--                  └─────────1──────────┘└─────────2──────────┘         -->

That way we just reassemble all parts of the macro together.

But we also need to add xml:space="preserve" to all <w:t> we move, to handle this situation:

<w:r><w:t>${variable1</w:t></w:r><w:r><w:t>} ${variable2</w:t></w:r><w:r><w:t>}</w:t></w:r>

that would be transformed to:

<w:r><w:t>${variable1}</w:t></w:r><w:r><w:t xml:space="preserve"> ${variable2}</w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r>
<!--                                        └──────────────────┘ │                                  └──────────────────┘          -->
<!--                                            we need this     └─ to preserve this space        (this one does not hurt)        -->

I updated the fix to handle this

chapa avatar Nov 16 '21 17:11 chapa

Will there be any movement on this PR in the near future? I'm experiencing isses with missing whitespaces all over my documents.

will2877 avatar Jan 12 '22 14:01 will2877

Gently ping @troosan

@will2877 until this is merged you can use cweagans/composer-patches with the patch at https://github.com/PHPOffice/PHPWord/commit/f955a818014be369a50371da2445dda1c6aaeaf0.patch

chapa avatar Jan 21 '22 11:01 chapa

I rebased the branch to be up to date with master, this needed some adaptations to handle custom macro chars that wasn't overridable at the time the PR was made.

@Progi1984 can you take a look ?

chapa avatar Dec 01 '23 10:12 chapa