MDsveX icon indicating copy to clipboard operation
MDsveX copied to clipboard

"Expected valid tag name" when using special characters like <

Open rhythm-section opened this issue 4 years ago • 16 comments

Hello @pngwn, thank you for this great project!

I am having an issue when using the < character inside a Markdown file. Even when replacing it with &lt;. It seems the &lt; gets replaced by < again resulting in the same error from the svelte compiler ("Expected valid tag name"). I even tried to escape the character with \ without success.

Is there any way to escape special characters so the svelte compiler does not throw an error?

My current "solution" is to wrap < inside an inline code block but I do not want to show that part as code. This is the Markdown file I am talking about: https://github.com/nymea/nymea-plugins/blob/rework-readmes/awattar/README.md This is the version with the "fixed" inline code blocks. When removing those, the error gets thrown.

I am not sure if this is a bug or if I miss something here. Just found this code section in MDSveX:

// in code nodes replace the character witrh the html entities
// maybe I'll need more of these

const entites = [
	[/</g, '&lt;'],
	[/>/g, '&gt;'],
	[/{/g, '&#123;'],
	[/}/g, '&#125;'],
];

So I guess the < character should only be replaced inside code blocks as mentioned in the comment, but using &lt; leads to the same error because somewhere during the preprocess it gets replaced with < again.

rhythm-section avatar Jun 04 '20 09:06 rhythm-section

Interesting, I would expect using a raw < to break but mdsvex only explicitly replaces the above characters in fenced code (either inline or block). Using &lt; etc. should work. The markdown parser maybe converting these entities behind the scenes. I'll take a look at this.

pngwn avatar Jun 04 '20 11:06 pngwn

They are getting decoded by the markdown parser, this problem is a litle more complex than I thought. I have a potential solution but I'm going to see if there is a simpler way of solving the problem.

In other news my investigations have uncovered another bug, the following does not work either:

 - 1 {"<"} 2

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

pngwn avatar Jun 16 '20 21:06 pngwn

I can feel another custom node type coming on (for entities), modifying the parser seems to be the "best" approach and should be less work than trying to selectively undo the entity decoding in the transform phase.

pngwn avatar Jun 16 '20 21:06 pngwn

Thank you for the investigation! The custom node type sounds good to me.

rhythm-section avatar Jun 22 '20 05:06 rhythm-section

Try to use {@html ...} as a workaround

TheComputerM avatar Aug 08 '20 05:08 TheComputerM

I'm running into this with Katex as well: #113

cesutherland avatar Aug 08 '20 19:08 cesutherland

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

pngwn avatar Aug 14 '20 20:08 pngwn

#113 has some other information and a nice test case.

pngwn avatar Aug 14 '20 20:08 pngwn

This will be partially addressed by the work discussed in #116. I can make > and } be legal characters in the document without issue (as my html syntax will be very strict and I can escape plain text variants of those characters), however < and { will never be legal plain text characters as they mark the start of various states that the parser will enter. To a degree this issues is irresolvable because some characters just conflict with html and svelte syntax in a way that cannot be correctly analysed. There are a few ways to support some cases but I'll have to look into those at a later date.

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: https://github.com/pngwn/MDsveX/issues/113#issue-675576096 (tl;dr: writing foo < bar to make some didactic point)

wlach avatar Apr 04 '21 11:04 wlach

Yeah, I am considering this for < for this specific reason. It seems a reasonable tradeoff because otherwise writing very basic syntax will be very difficult. This is especially notable for mdsvex because users are typically developers of some description and lessthan and greaterthan symbols will appear more often than in a typical document.

For curly braces, I'm less certain. It is quite common to have leading and trailing spaces for text expressions (example). I think block syntax requires there to be no space before the # in the current implementation but I can't quite recall as there is no spec.

Curly braces are just generally problematic, they are quite commonly used in custom markdown syntax for additional metadata/ attributes but they pose a bit problem because of their importance to svelte. I'll take a look at some popular use-cases and see if I can figure out a way to disambiguate them when I start work on yet another parser for mdsvex.

I have a new parser (the svelte-parse) that observes this rule and has a well defined AST, although not a parsing spec. However, this will need to be rewritten, probably twice,(don't ask) the first of which I will be starting soon (the second will have no user impact and will be purely internal but more of a long term goal). When I do that, it will also have a parsing spec.

pngwn avatar Apr 04 '21 12:04 pngwn

Is there a workaround for now ?

I tried the following in the playground and all fails

5 &lt; 10
5 < 10
5 {"<"} 10
5 {<} 10
5 {@html <} 10
5 {@html &lt;} 10

wighawag avatar May 23 '21 17:05 wighawag

Its awful but double escaping seems to work:

5 &amp;lt; 10

It doesn't work in the playground though. For some reason &amp; makes the playground hit another bug and error with Document is not defined

josephg avatar May 25 '21 07:05 josephg

I wonder if mdsvex should interpret < and { characters followed by a space literally, generally well-formatted HTML/svelte doesn't do this and handling this in a special way would allow a number of obvious cases to work, such as this one: #113 (comment) (tl;dr: writing foo < bar to make some didactic point)

The commonmark specification has a list of rules for what constitutes legal tags. Anything that isn't a valid tag is escaped. This example shows a < followed by a space is not considered a valid tag name. Eg, < a> encodes to &lt; a&gt;. (As it does in this comment.)

Commonmark has a test suite of JSON content. We should get that test suite passing in mdsevx.

josephg avatar May 26 '21 01:05 josephg

When smartypants is enabled (which it is by default), the quotes get converted to fancy quotes. (#83)

Yes, this is compounded in plugins as well, for example I started using remark-directive and it transforms quotes into smart quotes before it gets to the directive parsing part. Which messes up parameter passing.

I guess it would be cool to have more control over it, maybe when this conversion occurs in the mdsvex pipeline? maybe choose the types of elements it operates on?

Madd0g avatar Oct 26 '21 19:10 Madd0g

Mdsvex will never be commonmark compliant. Even less so I'm 1.0. That said, for 1.0, I'll be porting/modifying the commonmark test cases across and restricting html syntax to solve this issue.

In the current implementation there isn't anything that can be done about it.

I'm working on the 1.0 parser now, which will bring this under my control.

pngwn avatar Oct 26 '21 22:10 pngwn