avhtml
avhtml copied to clipboard
Closing comments handling
I found a corner case, in which closing comments are not handled correctly. i.e.
<!--x>Comment<!-->
The end of the comment is marked by the second <!-->
, but accidentally everything afterwards will be treated as comment also.
The following snippet demonstrates the problem (output is empty instead of "HELLO WORLD"):
html::dom page;
page.append_partial_html("<!--x>Comment<!--><html><head><title>HELLO WORLD</title></head><body></body></html>");
std::cout << page["title"].to_plain_text() << std::endl;
According to the HTML5 specification, parsing of the comment should happen as following:
Data state
<
Markup declaration open state
--
Comment start state
x
comment state
>Comment<!
Append the current input character to the comment token's data
-
Comment end dash state
-
Comment end state
>
Data state
Current implementation is in comment state (state = 12) while >Comment
is getting parsed, but switches the state when the <!
characters are encountered to state = 10.
case '<':
{
c = getc();
if (c == '!') {
pre_state = state;
state = 10;
} else {
content += '<';
content += c;
}
}
break;