acorn-prettify icon indicating copy to clipboard operation
acorn-prettify copied to clipboard

Problem with Document class and html() function

Open alonjc opened this issue 4 weeks ago • 1 comments

Version

1.0.3

What did you expect to happen?

When parsing link and script elements was expecting to get back <?xml encoding="UTF-8"><link rel="stylesheet"

What actually happens?

Getting back <link rel="stylesheet"

Steps to reproduce

Using Acorn prettify with clean-html5-markup.

System info

The problem happens when using libxml2 Version (dotted): 2.15.1 on OS Linux.

Log output

[24-Nov-2025 21:51:24 UTC] ========================================
[24-Nov-2025 21:51:24 UTC] 🔍 DEBUG STYLE TAG #1 - Handle: formidable
[24-Nov-2025 21:51:24 UTC] ========================================
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 📥 ORIGINAL HTML:
[24-Nov-2025 21:51:24 UTC] <link rel='stylesheet' id='formidable-css' href='https://stage.sarig.com/app/plugins/formidable/css/formidableforms.css?ver=11241526' media='all' />

[24-Nov-2025 21:51:24 UTC] Length: 149
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔧 STEP 1: Creating Document instance...
[24-Nov-2025 21:51:24 UTC] ✅ Document created successfully
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔧 STEP 2: Calling saveHTML()...
[24-Nov-2025 21:51:24 UTC] 📤 RAW saveHTML() output:
[24-Nov-2025 21:51:24 UTC] <!--?xml encoding="UTF-8"--><link rel="stylesheet" id="formidable-css" href="https://stage.sarig.com/app/plugins/formidable/css/formidableforms.css?ver=11241526" media="all">

[24-Nov-2025 21:51:24 UTC] Length: 175
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔬 FIRST 30 CHARACTERS (byte by byte):
[24-Nov-2025 21:51:24 UTC]   [ 0] ASCII  60 = "<"
[24-Nov-2025 21:51:24 UTC]   [ 1] ASCII  33 = "!"
[24-Nov-2025 21:51:24 UTC]   [ 2] ASCII  45 = "-"
[24-Nov-2025 21:51:24 UTC]   [ 3] ASCII  45 = "-"
[24-Nov-2025 21:51:24 UTC]   [ 4] ASCII  63 = "?"
[24-Nov-2025 21:51:24 UTC]   [ 5] ASCII 120 = "x"
[24-Nov-2025 21:51:24 UTC]   [ 6] ASCII 109 = "m"
[24-Nov-2025 21:51:24 UTC]   [ 7] ASCII 108 = "l"
[24-Nov-2025 21:51:24 UTC]   [ 8] ASCII  32 = " "
[24-Nov-2025 21:51:24 UTC]   [ 9] ASCII 101 = "e"
[24-Nov-2025 21:51:24 UTC]   [10] ASCII 110 = "n"
[24-Nov-2025 21:51:24 UTC]   [11] ASCII  99 = "c"
[24-Nov-2025 21:51:24 UTC]   [12] ASCII 111 = "o"
[24-Nov-2025 21:51:24 UTC]   [13] ASCII 100 = "d"
[24-Nov-2025 21:51:24 UTC]   [14] ASCII 105 = "i"
[24-Nov-2025 21:51:24 UTC]   [15] ASCII 110 = "n"
[24-Nov-2025 21:51:24 UTC]   [16] ASCII 103 = "g"
[24-Nov-2025 21:51:24 UTC]   [17] ASCII  61 = "="
[24-Nov-2025 21:51:24 UTC]   [18] ASCII  34 = """
[24-Nov-2025 21:51:24 UTC]   [19] ASCII  85 = "U"
[24-Nov-2025 21:51:24 UTC]   [20] ASCII  84 = "T"
[24-Nov-2025 21:51:24 UTC]   [21] ASCII  70 = "F"
[24-Nov-2025 21:51:24 UTC]   [22] ASCII  45 = "-"
[24-Nov-2025 21:51:24 UTC]   [23] ASCII  56 = "8"
[24-Nov-2025 21:51:24 UTC]   [24] ASCII  34 = """
[24-Nov-2025 21:51:24 UTC]   [25] ASCII  45 = "-"
[24-Nov-2025 21:51:24 UTC]   [26] ASCII  45 = "-"
[24-Nov-2025 21:51:24 UTC]   [27] ASCII  62 = ">"
[24-Nov-2025 21:51:24 UTC]   [28] ASCII  60 = "<"
[24-Nov-2025 21:51:24 UTC]   [29] ASCII 108 = "l"
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔧 STEP 3: Applying substr(saveHTML(), 23)...
[24-Nov-2025 21:51:24 UTC] ✂️  REMOVED (first 23 chars): '<!--?xml encoding="UTF-'
[24-Nov-2025 21:51:24 UTC] ✂️  REMAINING: 8"--><link rel="stylesheet" id="formidable-css" href="https://stage.sarig.com/app/plugins/formidable/css/formidableforms.css?ver=11241526" media="all">

[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔧 STEP 4: Calling Document::html() (substr + trim)...
[24-Nov-2025 21:51:24 UTC] 📤 FINAL OUTPUT:
[24-Nov-2025 21:51:24 UTC] 8"--><link rel="stylesheet" id="formidable-css" href="https://stage.sarig.com/app/plugins/formidable/css/formidableforms.css?ver=11241526" media="all">
[24-Nov-2025 21:51:24 UTC] Length: 151
[24-Nov-2025 21:51:24 UTC] 
[24-Nov-2025 21:51:24 UTC] 🔄 COMPARISON:
[24-Nov-2025 21:51:24 UTC]   Original length: 149
[24-Nov-2025 21:51:24 UTC]   Final length: 151
[24-Nov-2025 21:51:24 UTC]   Difference: -2 characters
[24-Nov-2025 21:51:24 UTC]   ⚠️  HTML WAS MODIFIED!
[24-Nov-2025 21:51:24 UTC]   🔴 FOUND "8\"-->" IN OUTPUT!
[24-Nov-2025 21:51:24 UTC]   Context: 8"--><link rel="stylesheet" id="formidable-css" href="https://stage.sa
[24-Nov-2025 21:51:24 UTC] ========================================

Please confirm this isn't a support request.

Yes

alonjc avatar Nov 25 '25 05:11 alonjc

The problem is in public function html(): string { return trim(substr($this->document->saveHTML(), 23)); }

the const of 23 is wrong for the use case since libxml2 changed the output and added it as a comment. Better use regx to remove xml encoding element.

alonjc avatar Nov 25 '25 06:11 alonjc

Hey thanks for the report! Care to do a PR?

Log1x avatar Nov 26 '25 23:11 Log1x

PR Submitted, please review @Log1x

dustingrofthenumber avatar Dec 05 '25 01:12 dustingrofthenumber