markdown icon indicating copy to clipboard operation
markdown copied to clipboard

How to handle escaping

Open vbisbest opened this issue 3 years ago • 3 comments

I have some HTML embeded in JSON format e.g.

text: "This is a how you make a paragraph

new paragraph

"

I need to escape the HTML encoding because the HTML page actually renders the tags. When I tried to escape my JSON content e.g: text: "This is a how you make a paragraph <p>new paragraph</p>"

The ToHTML encodes the encoding and I get double encoded: &lt;p&gt;new paragraph

Thoughts on how to handle this? Thank you.

vbisbest avatar Oct 08 '21 13:10 vbisbest

I had a similar issue. I noticed HTML wasn't being escaped so I used html.EscapeString before converting to HTML and it started double escaping the tags. The issue is that the package does it's own escaping but give that some implementations of markdown support a handful of html tags it doesn't escape the tags. So when you escape it yourself you insert ampersands which get escaped by the package. For me this was a problem. I managed to find the code but I didn't want to fork the repo to disable this. Instead I've found a way using the render hook that allows you to escape the HTML code:

	func escapeHTMLHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
		switch node.(type) {
		case *ast.HTMLSpan: // allow the html to be escaped
			EscapeHTML(w, node.AsLeaf().Literal)
			return ast.GoToNext, true
		case *ast.HTMLBlock: // allow the html to be escaped
			io.WriteString(w, "\n")
			EscapeHTML(w, node.AsLeaf().Literal)
			io.WriteString(w, "\n")
			return ast.GoToNext, true
		}
		return ast.GoToNext, false
	}

To test this out:

	....
		extensions := parser.CommonExtensions

		p := parser.NewWithExtensions(extensions)
		
		opts := html.RendererOptions{
			Flags:          html.CommonFlags,
			RenderNodeHook: escapeHTMLHook,
		}

		r := html.NewRenderer(opts)

		markup := markdown.ToHTML(md, p, r)
	....

This seems to handle single line HTML ok but for tags that cross multiple lines you lose your line breaks and no markdown between the open a close tags is rendered.

I have been looking into adding a ParserHook to override the leftAngle behaviour so it's completely blind to HTML (apart from the escaping) but I cannot make this work nicely yet.

chrisesimpson avatar Nov 17 '21 22:11 chrisesimpson

I gave up on trying to use the parser and renderer extensions to make this work as everything between the open and close tags was being rendered as text. So if you had something like:

<span>This is a **span** element</span>

It would not make the work "span" bold. I wanted it instead just to render the escaped opening and closing tags separately so that the content, whether block or inline could also be considered by the package.

So instead I added a new parser option to disallow any tags and a new renderer option to escape them.

....
p.Opts.Flags = parser.DisallowHtmlTags

opts := html.RendererOptions{
	Flags: html.EscapeHTMLTags,
....

I'm not sure what the proper protocol for submitting a pull request to this project but I will look into that. Meanwhile, if you want me to share the changes with you, I'm happy to.

chrisesimpson avatar Nov 18 '21 19:11 chrisesimpson

I also run into similar problem but not blame to this package.

This package gives me <p>This is a post</p> which is fine, but this string got escaped in html/template and turn out to be &lt;p&gt;This is a post&lt;p&gt;, so I replace html/template with text/template and nothing escaped.

OpenWaygate avatar May 03 '23 14:05 OpenWaygate