php-dom-wrapper icon indicating copy to clipboard operation
php-dom-wrapper copied to clipboard

How can I use this library to convert a tag-balanced HTML fragment into a node list idiomatically, reliably and 1:1?

Open rulatir opened this issue 1 year ago • 0 comments
trafficstars

What is the idiomatic way to use this library to convert a tag-balanced HTML fragment in a string into a node list, in a reliable 1:1 manner that doesn't require checking for multiple corner cases?

$nodeList = what_goes_here("Some text <span>a tag</span> some more text");

// $node list should now contain the exact structure [ TEXT, <span> [ TEXT ] </span>, TEXT ]
// as starkly opposed to [ <p> [ TEXT, <span> [ TEXT ] </span>, TEXT ] </p> ]
// which is what I obtain from ->create("Some text <span>a tag</span> some more text")

EDIT: the issue seems to be that there is no way to specify LIBXML_HTML_NOIMPLIED as a global policy. Even if you set the option after creating the document and before loading contents, various manipulation functions will create other document objects internally for processing, and they won't propagate the LIBXML_HTML_NOIMPLIED option to them; looks like they couldn't even do that at all, because there is no Document::getLibxmlOptions().

rulatir avatar Mar 20 '24 16:03 rulatir