Comments before `html` or doctype.
I've seen some documents in the wild that have a <!-- ... --> comment block before the first html node (but after <!doctype>). I'm not super sure if that's "valid", but it is annoying that that's causing an error. Since it's a comment, I could definitely see that the error is basically given a restart to ignore any nodes before actual start of the document, alternatively those could be stored and handled once the document has been created?
Comments before the html element is valid. Can you provide an example that causes the error? The following works fine for me.
(html5-parser:parse-html5 "<!doctype html><!-- comment --><html>hello</html>")
Ah, sorry, I should've tried it without CXML. It works like you said, only when :dom :cxml is added, using the mapping to the CXML DOM, does this fail:
There is no applicable method for the generic function:
#<STANDARD-GENERIC-FUNCTION DOM:CREATE-COMMENT #x30200308948F>
when called with arguments:
(NIL " hello, world! ")
[Condition of type NO-APPLICABLE-METHOD-EXISTS]
...
Backtrace:
...
2: ((:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM ((EQL :CXML) T))) #<COMMENT-NODE NIL #x30200343D07D> NIL NIL)
3: (MAP NIL #<COMPILED-LEXICAL-CLOSURE (:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM (# T))) #x3020034491DF> (#<DOCUMENT-TYPE html #x30200343D16D> #<COMMENT-NODE NIL #x30200343D07D> ..)..
4: (#<STANDARD-METHOD HTML5-PARSER:TRANSFORM-HTML5-DOM ((EQL :CXML) T)> :CXML #<DOCUMENT nodes: 5 #x30200344752D>)
Locals:
TO-TYPE = :CXML
NODE = #<DOCUMENT nodes: 5 #x30200344752D>
DOCUMENT-TYPE = #<RUNE-DOM::DOCUMENT-TYPE #x3020034490AD>
DOCUMENT = NIL
DOCUMENT-FRAGMENT = NIL
#:WALK = #<COMPILED-LEXICAL-CLOSURE (:INTERNAL HTML5-PARSER::WALK (HTML5-PARSER:TRANSFORM-HTML5-DOM (# T))) #x3020034491DF>
5: (NIL #<Unknown Arguments>)
6: (HTML5-PARSER::PARSE-HTML5-FROM-SOURCE "<!doctype html><!-- hello, world! --><html></html>" :CONTAINER NIL :ENCODING NIL :STRICTP NIL :DOM :CXML)
Delaying adding the comment node until document exists in transform-html5-dom works to fix this.