DiDOM
DiDOM copied to clipboard
Chinese characters inside script tag breaking
<?php
require_once 'vendor/autoload.php';
$html = <<<EOD
<!DOCTYPE html>
<html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html>
EOD;
$doc = new \DiDom\Document();
$doc->loadHTML($html);
echo $doc->html();
Will result in:
<!DOCTYPE html>
<html>
<head>
<script>
const str = "訂閱最新指南";
</script>
</head>
<body>
</body>
</html>
And also this one:
<style>
body {
background-image: url('https://example.com/örebro.jpg');
}
</style>
It seems when the HTML is load()
ed it is encoded:
https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Document.php#L291-L293
https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Encoder.php#L13-L24
But not decoded on calls to html()
https://github.com/Imangazaliev/DiDOM/blob/346db1ea94a0f6ead225c2358af770bf33659cf7/src/DiDom/Document.php#L595-L598
@GwendolenLynch you're right! I've opened a PR https://github.com/Imangazaliev/DiDOM/pull/180