cl-html5-parser icon indicating copy to clipboard operation
cl-html5-parser copied to clipboard

Comments before `html` or doctype.

Open Ferada opened this issue 6 years ago • 3 comments

I've seen some documents in the wild that have a <!-- ... --> comment block before the first html node (but after <!doctype>). I'm not super sure if that's "valid", but it is annoying that that's causing an error. Since it's a comment, I could definitely see that the error is basically given a restart to ignore any nodes before actual start of the document, alternatively those could be stored and handled once the document has been created?

Ferada avatar Aug 12 '19 23:08 Ferada

Comments before the html element is valid. Can you provide an example that causes the error? The following works fine for me.

(html5-parser:parse-html5 "<!doctype html><!-- comment --><html>hello</html>")

bakketun avatar Aug 15 '19 03:08 bakketun

Ah, sorry, I should've tried it without CXML. It works like you said, only when :dom :cxml is added, using the mapping to the CXML DOM, does this fail:

There is no applicable method for the generic function:
  #<STANDARD-GENERIC-FUNCTION DOM:CREATE-COMMENT #x30200308948F>
when called with arguments:
  (NIL " hello, world! ")
   [Condition of type NO-APPLICABLE-METHOD-EXISTS]

...

Backtrace:
...
  2: ((:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM ((EQL :CXML) T))) #<COMMENT-NODE NIL #x30200343D07D> NIL NIL)
  3: (MAP NIL #<COMPILED-LEXICAL-CLOSURE (:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM (# T))) #x3020034491DF> (#<DOCUMENT-TYPE html #x30200343D16D> #<COMMENT-NODE NIL #x30200343D07D> ..)..
  4: (#<STANDARD-METHOD HTML5-PARSER:TRANSFORM-HTML5-DOM ((EQL :CXML) T)> :CXML #<DOCUMENT nodes: 5 #x30200344752D>)
      Locals:
        TO-TYPE = :CXML
        NODE = #<DOCUMENT nodes: 5 #x30200344752D>
        DOCUMENT-TYPE = #<RUNE-DOM::DOCUMENT-TYPE #x3020034490AD>
        DOCUMENT = NIL
        DOCUMENT-FRAGMENT = NIL
        #:WALK = #<COMPILED-LEXICAL-CLOSURE (:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM (# T))) #x3020034491DF>
  5: (NIL #<Unknown Arguments>)
  6: (HTML5-PARSER::PARSE-HTML5-FROM-SOURCE "<!doctype html><!-- hello, world! --><html></html>" :CONTAINER NIL :ENCODING NIL :STRICTP NIL :DOM :CXML)

Ferada avatar Aug 15 '19 11:08 Ferada

Delaying adding the comment node until document exists in transform-html5-dom works to fix this.

Ferada avatar Aug 15 '19 11:08 Ferada