DiDOM icon indicating copy to clipboard operation
DiDOM copied to clipboard

Chinese characters inside script tag breaking

Open gijo-varghese opened this issue 3 years ago • 3 comments

<?php

require_once 'vendor/autoload.php';

$html = <<<EOD
<!DOCTYPE html>
<html>
    <head>
        <script>
            const str = "訂閱最新指南";
        </script>
    </head>
    <body>
    </body>
</html>
EOD;

$doc = new \DiDom\Document();
$doc->loadHTML($html);
echo $doc->html();

Will result in:

<!DOCTYPE html>
<html>
    <head>
        <script>
            const str = "&#35330;&#38321;&#26368;&#26032;&#25351;&#21335;";
        </script>
    </head>
    <body>
    </body>
</html>

gijo-varghese avatar Mar 06 '21 17:03 gijo-varghese

And also this one:

<style>
  body {
    background-image: url('https://example.com/örebro.jpg');
  }
</style>

gijo-varghese avatar Oct 22 '21 03:10 gijo-varghese

It seems when the HTML is load()ed it is encoded: https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Document.php#L291-L293 https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Encoder.php#L13-L24

But not decoded on calls to html() https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Document.php#L595-L598

GwendolenLynch avatar Jan 21 '22 17:01 GwendolenLynch

@GwendolenLynch you're right! I've opened a PR https://github.com/Imangazaliev/DiDOM/pull/180

gijo-varghese avatar Jan 30 '22 16:01 gijo-varghese