avhtml icon indicating copy to clipboard operation
avhtml copied to clipboard

Closing comments handling

Open const-volatile opened this issue 7 years ago • 0 comments

I found a corner case, in which closing comments are not handled correctly. i.e. <!--x>Comment<!--> The end of the comment is marked by the second <!-->, but accidentally everything afterwards will be treated as comment also.

The following snippet demonstrates the problem (output is empty instead of "HELLO WORLD"):

html::dom page;
page.append_partial_html("<!--x>Comment<!--><html><head><title>HELLO WORLD</title></head><body></body></html>");
std::cout << page["title"].to_plain_text() << std::endl;

According to the HTML5 specification, parsing of the comment should happen as following:

Data state < Markup declaration open state -- Comment start state x comment state >Comment<! Append the current input character to the comment token's data - Comment end dash state - Comment end state > Data state

Current implementation is in comment state (state = 12) while >Comment is getting parsed, but switches the state when the <! characters are encountered to state = 10.

case '<':
  {
    c = getc();
    if (c == '!') {
      pre_state = state;
      state = 10;
    } else {
      content += '<';
      content += c;
    }
  }
  break;

const-volatile avatar Aug 01 '17 06:08 const-volatile