php-markdown icon indicating copy to clipboard operation
php-markdown copied to clipboard

One too many conversions of html special characters & and < when inside em text

Open Sameh-R-Labib opened this issue 4 years ago • 2 comments

your code has a bug where if an ampersand or a less than sign is found in code marked (in markdown) as em via single asterisks the resulting html will contain double the encoding of the html special characters into their corresponding html entities. I've attached three screenshots: 1) markdown 2) resulting html 3) php code which calls your function

markdown static The code

Sameh-R-Labib avatar Apr 10 '21 00:04 Sameh-R-Labib

I experience the same issue, when enabeling Markdown Extra at Grav CMS with inline code - any idea how to solve this?

lorddoumer avatar Dec 21 '21 08:12 lorddoumer

This is related to the no_entities mode. I suppose using the hashing system to make the generated &amp; invisible to subsequent passes would fix the issue. For instance by adding two hashPart calls in the encodeAmpsAndAngles function:

	protected function encodeAmpsAndAngles($text) {
		if ($this->no_entities) {
			$text = str_replace('&', $this->hashPart('&amp;', ':'), $text);
		} else {
			// Ampersand-encoding based entirely on Nat Irons's Amputator
			// MT plugin: <http://bumppo.net/projects/amputator/>
			$text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/',
								'&amp;', $text);
		}
		// Encode remaining <'s
		$text = str_replace('<', $this->hashPart('&lt;', ':'), $text);

		return $text;
	}

This is not a terribly efficient way of doing it (calling hashPart every time encodeAmpsAndAngles is called), but it should work.

It's a bit sad there's nothing in the test suite for the no_entities mode.

michelf avatar Dec 21 '21 12:12 michelf