php-htmldiff icon indicating copy to clipboard operation
php-htmldiff copied to clipboard

Warning: DOMDocument::loadHTML(): Unexpected end tag : u in Entity

Open tkoop opened this issue 1 year ago • 2 comments

We sometimes get this "Unexpected end tag" problem, and this is how to reproduce it. The following PHP file is very sensitive to spaces, so make sure each and every space is copied correct.y The above warning seems to go away when we use the "keep new lines" config option and remove all the spaces.

<html>

<p>This code fails.  To get it working, remove one space before the "ol" tag on line 31, which is just under "...Something here..." in $newHtml</p>

<?php

$oldHtml = '<ol>
        <li><u>Publication:</u>
          <ol>
            <li>This sentence.</li>
          </ol>
        </li>
        <li><u>Something here</u>:
          <ol>
            <li>Another item</li>
          </ol>
        </li>
      </ol>
      <ol>
        <li><u>Mars</u>:</li>
        <li>Saturn</li>
      </ol>';

      $newHtml = '<ol>
    <li><u>Publication:</u>
     <ol>
     <li>This sentence.</li>
     </ol>
     </li>
     <li><u>Something here</u>:
      <ol>
      <li>Another item</li>
      </ol>
      </li>
      <li><u>Mars</u>:
      <ol>
      <li>Saturn</li>
      </ol>
      </li>
    </ol>';

error_reporting(E_ALL);
ini_set('display_errors', '1');

require __DIR__ . '/../vendor/autoload.php';

use Caxy\HtmlDiff\HtmlDiff;
use Caxy\HtmlDiff\HtmlDiffConfig;

$config = new HtmlDiffConfig();
$config->setKeepNewLines(true);

$htmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config);
$content = $htmlDiff->build();

echo "Diff is " . $content;

?>

</html>

tkoop avatar Apr 04 '23 20:04 tkoop

We are having same problem: DOMDocument::loadHTML(): Tag mark invalid in Entity. I found that this happens because Caxy\HtmlDiff\ListDiffLines::listByLines() method uses DOMDocument::loadHTML() and as far as I know libxml 2.6+ works wrong with HTML5 tags. Actually this is very spread issue.

I think XML errors suppressing could be used there using libxml_use_internal_errors(true); and libxml_use_internal_errors(false); after loadHTML() was done.

I will try to investigate this issue deeper and write a PR but it seems that no one are working on this repo. So there is almost no chance that my corrections will be accepted.

MykhailoSukovitsyn avatar Nov 05 '23 13:11 MykhailoSukovitsyn

@MykhailoSukovitsyn If you get a PR open for this, we will review and merge

jschroed91 avatar Nov 05 '23 23:11 jschroed91