go icon indicating copy to clipboard operation
go copied to clipboard

x/net/html: Unable to parse conditional comments

Open programuotojasgf opened this issue 4 years ago • 3 comments

What version of Go are you using (go version)?

1.16.4

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/_/.cache/go-build"
GOENV="/home/_/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/_/go/pkg/mod"
GONOPROXY="bitbucket.org/_"
GONOSUMDB="bitbucket.org/_"
GOOS="linux"
GOPATH="/home/_/go"
GOPRIVATE="bitbucket.org/_"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.16"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.16/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.4"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1444542212=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I tried to parse a conditional comment tag <!--[if mso]> . See https://en.wikipedia.org/wiki/Conditional_comment for more details. https://play.golang.org/p/TSUYq9g4i4r

package main

import (
	"bytes"
	"fmt"
	"io"
	"golang.org/x/net/html"
)

func main() {
	Render([]byte("<!--[if mso]> HTML <![endif]-->"))
	fmt.Println()
	Render([]byte("<![if mso]> HTML <![endif]>"))
}

func Render(b []byte) {
	tokenizer := html.NewTokenizer(bytes.NewReader(b))
	for {
		if tokenizer.Next() == html.ErrorToken {
			err := tokenizer.Err()
			if err == io.EOF {
				// End of input means end of processing
				return
			}
		}
	
		token := tokenizer.Token()
		fmt.Print(token)
		fmt.Println(token.Type)
	}
}

What did you expect to see?

I expect the parser recognize not only standard comments(<!-- Comment content -->), but also downlevel-hidden(<!--[if expression]> HTML <![endif]-->) and downlevel-revealed(<![if expression]> HTML <![endif]>) comments.

<!--[if mso]>Comment 
 HTML Text
<![endif]-->Comment

<![if mso]>Comment 
 HTML Text
<![endif]>Comment

What did you see instead?

<!--[if mso]> HTML <![endif]-->Comment

<!--[if mso]-->Comment
 HTML Text
<!--[endif]-->Comment

Note how '<!--[if mso]>' is not even recognized as a tag and how all comments are forced into the dash format, regardles if they had dashes or not.

programuotojasgf avatar May 19 '21 10:05 programuotojasgf

Can you please include a short program (or a link on playground) that reproduces this?

Issue #37771 may be connected.

CC @bradfitz, @ianlancetaylor via owners.

dmitshur avatar May 21 '21 17:05 dmitshur

Can you please include a short program (or a link on playground) that reproduces this?

Issue #37771 may be connected.

CC @bradfitz, @ianlancetaylor via owners.

I've included a demo in the original post by editing it.

programuotojasgf avatar May 22 '21 06:05 programuotojasgf

The support of conditional comments will be needed for a project I'm working on. How can I help with this investigation?

pior avatar Sep 20 '22 18:09 pior

https://play.golang.org/p/TSUYq9g4i4r demonstrates this issue well as posted above.

This bug was introduced in https://cs.opensource.google/go/x/net/+/06994584191ebed30077b5176cefe09703557528.

bensie avatar Nov 19 '22 05:11 bensie