ezpublish-legacy
ezpublish-legacy copied to clipboard
EZP-24292: Extract template operator(s) do not work correctly with utf8 strings
Hello,
We write today to share a PR for a problem which was brought up in the share.ez.no forums: http://share.ez.no/forums/developer/extract-returns-garbage-when-result-contains-special-chars
It seems that we found several template operators which are not utf8 string friendly. This PR started with trying to solve the problems in this file as it applies to the extract template operator.
Once our improvements worked well for that use case, we noticed that other operators powered by this file's code were also negatively affected by this classes limitations and we went ahead and cleaned up the rest of the use cases we could find that needed help.
Since these we require mbstring php extension by default we think this should be no problem, that said we went one step further and made it more dynamic if mbstring functions are not available, https://doc.ez.no/display/EZP/Requirements+5.4
We based the dynamic function usage testing for mbstring php extension functions before using them and providing a non-utf8 compatible fall back off the existing (inconsistant) implementation found in lib/eztemplate/classes/eztemplatestringoperator.php
Minor cs improvements were made as well.
This pull request replaces usage of strlen, strpos, substr and strrpos with mb string replacements in the context of the following template operators.
- extract
- extract_left
- extract_right
- begins_with
- ends_with
- insert
- remove
- contains
Worth noting that the reverse operator remains utf8 unfriendly since php by default does not yet contain a native mb_strrev function we would need to implement our own, which I have ignored for now.
Issue ticket: https://jira.ez.no/browse/EZP-24292
Please let us know how this finds you!
Cheers, Brookins Consulting
Please add this pull request.
ping @bdunogier
Ping @yannickroger I think you've worked on related problems, have you not ?
ext-mbstring is a requirement, so actually it could be swapped instead of having to feature detect this. But not a big deal of course.
Hello @andrerom
Our choice to provide for detection before use of mb-string functions was not an accident or entirely our decision.
The existing related code (default template operators which use mb-string functions) already provided / used this convention (although admittedly it did so inconsistently at best, perhaps because as we know today mb-string is a requirement).
In an effort to retain the best BC and code standards we ensured our changes mimicked this existing code convention.
We think these choices are the right choices at this time considering this PR is for eZ Publish Legacy.
What do you think? Do you think this PR is acceptable to merge?
Cheers, Brookins Consulting
Hi, just a small nitpick:
The mbstring function detection is done at compile time, and for non-constant elements the function call code - not value - is stored in the template. So, AFAICT there is a possibility for fatal errors when using templates compiled on a system with the mbstring extension in a system without it.
Given the above, a nitpick would be to perform the detection at run-time too (or as @andrerom noted, consider mbstring an actual requirement?)
@joaoinacio as @brookinsconsulting mentions, it is already the case several places in legacy, so better that it stays consistent.
@andrerom
Thank you for your evaluation and affirmation :)
We too think that consistency is important (especially in legacy).
Please let us know if there is anything more we can do to improve these improvements.
Cheers, Brookins Consulting