Document fragment now do not handle 2 siblings inputs
Bug Report
with the latest version if I have two sibling inputs (in the erb file):
<div>
<div id="label-container">
<label>
<input type="file" accept="image/*" />
<input type="hidden" value="" />
</label>
</div>
<div id="after-label"></div>
</div>
Describe the bug
after parsing with Nokogiri.parse instead of Nokogiri::HTML5 it drops the second input and "move" the closing label tag outside the original div.
To Reproduce
you can add this test in document_fragment_test
test "should properly parse two siblings input" do
raw_html = "<div><div id=\"label-container\"><label><input type=\"file\" accept=\"image/*\"><input type=\"hidden\" value=\"\"></label></div><div id=\"after-label\"></div></div>"
fragment = StimulusReflex::HTML::DocumentFragment.new(raw_html)
assert_equal 2, fragment.document_element.at_css('input').children.size
assert_equal "<div><div id=\"label-container\"><label><input type=\"file\" accept=\"image/*\"><input type=\"hidden\" value=\"\"></label></div><div id=\"after-label\"></div></div>", fragment.to_html.squish
assert_equal "<div><div id=\"label-container\"><label><input type=\"file\" accept=\"image/*\"><input type=\"hidden\" value=\"\"></label></div><div id=\"after-label\"></div></div>", fragment.outer_html.squish
end
Expected behavior
parse correctly the input tags
Versions
StimulusReflex
- Gem: [3.5.0.rc2]
- Node package: [N/A]
External tools
- Ruby: [3.1.2]
Thanks for reporting this! We're already looking at our options...
Hi, First of all : thank you very much for this awesome gem!
I can confirm that this bug has larger scope than just 2 siblings inputs, it strips any html after the input.
For instance this HTML will not be preserved:
<div>
<label>X</label>
<div>
<input type="text">
<span></span>
</div>
</div>
raw_html = "<div> <label>X</label> <div> <input type=\"text\"> <span></span> </div> </div>"
output_html = StimulusReflex::HTML::DocumentFragment.new(raw_html).to_html.squish
# returns "<div> <label>X</label> <div> <input type=\"text\"> </div></div>" => the span has been removed
This is an important issue for us because we need to be able to add icons after the input, and because of this bug, the icons are arbitrarily removed after each reflex.
This bug is really hard to confirm and to understand when you see it for the first time, I think we can assume that more Stimulus Reflex users have encountered it but could not find the cause.
Just wanted to highlight that this is a really nasty issue to debug and one that is probably affecting a lot of people. For us it manifested as "we upgraded stimulus reflex from 3.4.1 and now seemingly random parts of our html don't make it back to the page". It's not at all obvious that this issue and associated PR are relevant, unless you've already found the root cause.
It looks like #674 is currently on hold, waiting for Nokogiri to make some upstream fixes. @marcoroth would it be possible to merge that PR as an interim fix, so that Stimulus Reflex itself doesn't appear broken in mysterious ways in the meantime?
In case it's useful for anyone to know, #674 doesn't handle the following case:
# having a `->` in your input breaks the fix:
str = '<div class="foo"><input data-action="input->autocomplete#search"><span>some text</span></div>'
StimulusReflex::HTML::DocumentFragment.new(str).inner_html
#=> "<input data-action=\"input- />autocomplete#search\">" # span is missing
I'm sure this is just summarising what others already know, but I now understand that the difficult thing about this that there currently isn't a parser in Nokogiri that can handle the following:
-
<input><input>(can't be parsed byNokogiri::XML) -
<tr><td></td></tr>(can't be parsed byNokogiri::HTML5::Document) -
<head><title>hello</title></head>(can't be parsed correctly byNokogiri::HTML5::DocumentFragment)
It seems like the possibilities are:
- Wait for Nokogiri to fix
Nokogiri::HTML5::DocumentFragment, so it can parse 3 - Use a fix like #674 that's a bit fragile, but could be adapted to cover the common cases
- Make a (different) tradeoff between 1, 2, 3 above
- Switch from Nokogiri to a library that can handle all the cases
I don't know what the best path forwards is here, but tend to think that any of 2/3/4 are preferable to 1, assuming that the Nokogiri issue isn't going to be fixed imminently. I've put up #692 as a possibility for option 3. Maybe #674 is still the best thing to go for.
Thanks for summarizing this @tomclose! You are right! I think the best way to solve this is to do 4., at least for the time being.
There's an approach where we render out the whole page (as we do now), but instead of parsing and grabbing out the elements out of the parsed HTML via Nokogiri, we could send down the whole page and let the browser parse and grab out the elements we need.
That way, the browser is handling the valid/invalid HTML in both cases: 1) on first page load through the regular Rails request and 2) on re-render after a morph.
Ideally we could fix most of these things in Nokogiri itself. After talking to @flavorjones at RubyConf last year, he mentioned that the bahviour of the HTML5 part of Nokogiri itself might not be right/consistent in all cases. So if we could collect and report all of the weird findings, we might be able to get them fixed in upstream Nokogiri.
I'll see if I can implement a workaround, which might not be the most efficient on wire-size, but at least works the right and expected way so this issue can be unblocked.
:wave: Nokogiri maintainer here, this is the first I've heard of these problems other than https://github.com/sparklemotion/nokogiri/issues/3023
I just want to reiterate what I told @marcoroth in person, which is that there is no spec for fragment parsing in XML or HTML4, and so we adopted some conventions in Nokogiri that aren't serving us well with HTML5.
And the HTML5 fragment behavior is well-defined but requires a "context node" be provided because the HTML5 parsing rules are context-dependent -- for example, to parse <tr><td> you would need to provide a "context node" of <table> (I hope that makes sense, I can explain more if necessary or you can peek at the HTML5 spec's "in table" insertion mode section).
So I guess what I'm saying is, it seems unlikely that another library is going to handle your use cases unless and until we can figure out how to accomplish what stimulus-reflex is doing in an HTML5-compatible way. And I'd love to figure out what that looks like in collaboration with y'all.
I'm hoping to get to https://github.com/sparklemotion/nokogiri/issues/3023 in the next week or two, I'm going to be spending more time than previously on open source this month. It might be worth having another real-time conversation about this once I've swapped my mental context back in.
he mentioned that the bahviour of the HTML5 part of Nokogiri itself might not be right/consistent in all cases
This isn't quite right -- the HTML5 parser we're using passes all the current HTML5 tests. The parser is correct and follows the spec. And if it doesn't follow the spec, it's a bug.
I think what I meant to communicate is that the HTML5 behavior is going to be different from the HTML4 behavior in some cases because the HTML5 spec is different and better-specified, and the gumbo parser is spec-compliant.
@acarpe just wanted to tag you and mention that https://github.com/stimulusreflex/stimulus_reflex/pull/696 should be solving this issue! If you get a chance, I'd love to hear how it works for you! Thank you!