hpq
hpq copied to clipboard
Use markup root node as base for parse
This pull request seeks to change the behavior of passing a markup string, setting as the base for sourcing the root node of the markup string passed. This improves consistency with the element signature, and provide easier parsing against the root node in environments where the :scope pseudo-class selector is not supported:
const markup = '<div></div>';
const element = document.createElement( 'div' );
// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"
parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"
// After:
parse( markup, prop( 'nodeName' ) ); // "DIV"
parse( element, prop( 'nodeName' ) ); // "DIV"
This may be considered a breaking change, since previously it was possible to pass a markup string containing multiple elements without a parent wrapper. With these changes, the same markup would only perform parsing on the first element. While we could support both single and multiple-element markup strings by testing doc.body.childElementCount (doc.body.children.length), I don't know that this is a common use case and would require supporting multiple input formats. I am content with releasing a new major version describing this as a breaking change.
cc @youknowriad
What if we have blocks returning an array of elements? Does this mean we won't be able to parse attributes from these nodes (except the first one?)
What if we have blocks returning an array of elements? Does this mean we won't be able to parse attributes from these nodes (except the first one?)
Actually, you're right, in Gutenberg this could impact a block if its content is composed of multiple top-level elements, like:
<!-- wp:core/text -->
<p>Paragraph one</p>
<p>Paragraph two</p>
<!-- /wp:core/text -->
We could apply a wrapper when parsing, something like '<div>' + rawContent + '</div>'. But this returns us to the original issue, where parsing on the root node of rawContent is difficult in the case that it is a singular element.
Considering hpq alone, the changes here seem to be a good win for consistency, at least between passing markup and an element.
One other option is to convert the markup to an element before passing to hpq, basically recreating the behavior here:
https://github.com/aduth/hpq/blob/b79c5623140a5240ee56b0fd6f0da81f40c67868/src/index.js#L19-L24
...where we'd then have full control over how that element is created: Either use the first child if only one child, or keep it as body.
I still think this could lead to confusion though, as in the case of the multi-paragraph block and single-paragraph block, where each would have different results for attr( 'class' ) (sourced from body or the paragraph respectively).
I understand the feeling of confusion but in the same time let's look at this:
const a = parse( '<p id="something"></p><p id="something2"></p>', attr( 'id' ) );
const b = parse( '<p id="something"></p>', attr( 'id' ) );
If we think about what these should return, I see some logic in having undefined for the first and something for the second one.
I see some logic in having
undefinedfor the first
I could see this, but it's more unexpected in that attr only returns undefined here because the body wrapper we create doesn't have an ID. If you were to do something like prop( 'nodeName' ) instead, the problem is more evident (returns "BODY", not undefined).
Maybe if we had some way to prevent (or ensure undefined return) of singular element querying against the body wrapper, it wouldn't be so bad...
Maybe if we had some way to prevent (or ensure undefined return) of singular element querying against the body wrapper, it wouldn't be so bad...
Trying to think about the implications, I guess querying (using query) will be a bit odd.
const a = parse( '<p id="something"></p><p id="something2"></p>', query( 'p' ) );
const b = parse( '<p id="something"></p>', query( 'p' ) );
I guess I'd expect both to return the paragraphs, but the second one won't return anything.
The other option for consistency is to always ensure a wrapper, even in the element signature.
const markup = '<div></div>';
const element = document.createElement( 'div' );
// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"
parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"
// After:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "BODY"
To your last point, this could make more sense if you consider parsing to need at least some selector scope, and to have undetermined characteristics otherwise.
I think it could be reasonably possible to force an undefined return from attempts to parse the root wrapper:
diff --git a/src/index.js b/src/index.js
index 2aed4b9..42a0c2c 100644
--- a/src/index.js
+++ b/src/index.js
@@ -20,7 +20,8 @@ export function parse( source, matchers ) {
if ( 'string' === typeof source ) {
const doc = document.implementation.createHTMLDocument( '' );
doc.body.innerHTML = source;
- source = doc.body.firstChild;
+ source = doc.body;
+ source._hpqRoot = true;
}
// Return singular value
@@ -59,6 +60,8 @@ export function prop( selector, name ) {
let match = node;
if ( selector ) {
match = node.querySelector( selector );
+ } else if ( node._hpqRoot ) {
+ return;
}
if ( match ) {
At which point it becomes:
const markup = '<div></div>';
const element = document.createElement( 'div' );
// Before:
parse( markup, prop( 'nodeName' ) ); // "BODY"
parse( element, prop( 'nodeName' ) ); // "DIV"
parse( markup, prop( ':scope > div', 'nodeName' ) ); // "DIV"
// After:
parse( markup, prop( 'nodeName' ) ); // undefined
parse( element, prop( 'nodeName' ) ); // undefined
Obviously this doesn't solve awkwardness with the (lack of) a :scope pseudo-class selector.
I'm hesitant. It's one of these problems where no matter the option you choose, you need to understand how it works (at least a bit) to really know how to use it.
You choose :)
I still think the original proposal here is most sensible and makes for easy parsing of a root node, but acknowledge it causes some difficulty for multi-element blocks in Gutenberg. I don't think we currently have any of those, but the parser supports them. We'd have to consider whether requirements should allow for those sorts of blocks, otherwise eliminate support for it from the parser.
I'll think on this a bit more.
Another thing I don't particularly like about forced selectors is it results in a choice between redundancy and inconsistency with regards to the query matcher. The behavior of query is supposed to mimic parse in a sub-context, but then how is one to get class names for each paragraph with the markup <div><p class="one"></p><p class="two"></p></div>?
- It must be either inconsistent with root matching and allow omission of the selector:
parse( markup, query( 'p', attr( 'class' ) );
- Or it must be consistent with root matching, but redundant in the
'p'selector:parse( markup, query( 'p', attr( 'p', 'class' ) )
@aduth Can't we mimic the :scope behaviour somehow to avoid the browser compatibility issue? Would this solve all the issues here?
@aduth Can't we mimic the
:scopebehaviour somehow to avoid the browser compatibility issue? Would this solve all the issues here?
Not entirely, since it still leaves inconsistency with query submatching and the parse signature where developer passes an HTMLElement object. Also not something I'm very keen to polyfill 😬
According to MDN, :scope is deprecated?
https://developer.mozilla.org/en-US/docs/Web/CSS/:scope
Not sure if I had missed this previously, or if the recommendation has since changed.