processwire-issues icon indicating copy to clipboard operation
processwire-issues copied to clipboard

Swap strip_tags and htmlspecialchars in ProcessPageList so valid titles don't get cut off

Open schwarzdesign opened this issue 5 years ago • 8 comments

Short description of the issue

Titles that contain the exact string <= are cut off in the page tree. This is caused by strip_tags in the method that renders the page title.

As an example, take the following page title:

Mini-Jobber (gross <= 450 €/month)

Expected behavior

The title should be displayed completely in the page tree:

Mini-Jobber (gross <= 450 €/month)

Actual behavior

The title gets cut off, because strip_tags removes anything after <=:

Mini-Jobber (gross

Screenshots/Links that demonstrate the issue

With a space between the characters it works fine:

with-space

Without the space, the title gets cut off:

without-space

Suggestion for a possible fix

The issue is caused by this line in the source code. The titles get processed by both strip_tags and htmlspecialchars. Simply switching the order of those two functions would solve the problem:

php > $str = 'Mini-Jobber (gross <= 450 €/month)';
php > var_dump(htmlspecialchars(strip_tags($str), ENT_QUOTES, "UTF-8", false));
string(19) "Mini-Jobber (gross "
php > var_dump(strip_tags(htmlspecialchars($str, ENT_QUOTES, "UTF-8", false)));
string(39) "Mini-Jobber (gross &lt;= 450 €/month)"

Another possibility would be to drop strip_tags entirely, from a security standpoint it's not really required if you use htmlspecialchars anyway.

Steps to reproduce the issue

  1. On a standard ProcessWire installation, create a page with the exact characters <= somewhere inside it's title.
  2. The title in the page tree gets cut off at this point.

Setup/Environment

  • ProcessWire: 3.0.149
  • PHP: 7.4.3
  • Webserver: Apache
  • MySQL: 10.1.44-MariaDB-0ubuntu0.18.04.1

schwarzdesign avatar Mar 06 '20 14:03 schwarzdesign

Small update: As pointed out by RobinS, the issue will only appear if the default HTML Entity Encoder textformatter on the title field is turned off, since it would encode the <= before strip_tags is applied.

schwarzdesign avatar Jul 14 '20 07:07 schwarzdesign

@schwarzdesign Swapping the order of htmlspecialchars and strip_tags means that strip_tags does nothing at all, so at that stage could just be removed. The strip_tags is not necessary for security, as the htmlspecialchars is all that is necessary. But the strip_tags is there for looks, so that encoded HTML tags don't show in the page list, like if it was added from markdown formatting in the title or some other formatting. If I recall correctly, this was done (a long time ago) to resolve a previous issue report about encoded HTML tags showing in the page list. I'm not sure what's the right way to go here as it's subjective. I could have it allow <= as a specific case?

ryancramerdesign avatar Aug 13 '21 16:08 ryancramerdesign

@ryancramerdesign I'd argue that strip_tags could be removed completely if it's not required from a security standpoint. As long as HTML is properly encoded, why prevent it showing up in the page tree? The assumption that I never want to see HTML tags in the page tree isn't necessarily true (aside from edge-cases like this one). What if I want to build a system where the client selects an HTML element from a list and represent each element using a page? Then I would have page titles like <strong>, <em> etc and they wouldn't show up at all.

I'd say it's pretty rare for HTML to end up in titles, so if I have a situation where it does, I probably want it to show up in the page tree as well. And if I have a situation where I need HTML in the title but want to strip_tags in the page tree, I can solve that using hooks.

So since it's not required for security, I think the core should make as few assumptions as possible, so I'm all for dropping strip_tags entirely.

schwarzdesign avatar Aug 17 '21 08:08 schwarzdesign

Looks like $sanitizer->markupToText() removes <= in $value = trim(strip_tags($value));

matjazpotocnik avatar Jun 15 '23 09:06 matjazpotocnik

@matjazpotocnik I can't duplicate the issue here, are you able to? Maybe it's already been fixed? Page title of Mini-Jobber (gross <= 450 €/month) shows exactly as that in the page list and in lister.

ryancramerdesign avatar Jul 21 '23 14:07 ryancramerdesign

@ryancramerdesign Yes, I can duplicate:

image

image

Page title "Mini-Jobber (gross <= 450 €/month)" is shown as "Mini-Jobber (gross":

image

matjazpotocnik avatar Jul 21 '23 16:07 matjazpotocnik

@matjazpotocnik Strange, I can't duplicate it here on the current dev branch.

Screen Shot 2023-07-21 at 1 34 27 PM Screen Shot 2023-07-21 at 1 35 08 PM

ryancramerdesign avatar Jul 21 '23 17:07 ryancramerdesign

@ryancramerdesign please disable (remove from the list of textformatters) HTML Entity Encoder on the title field. If entity encoder is on, then < will be replaced with &lt; and all is good.

matjazpotocnik avatar Jul 21 '23 18:07 matjazpotocnik