reversemarkdown-net icon indicating copy to clipboard operation
reversemarkdown-net copied to clipboard

I want to parse according to CommonMark specification

Open pengqian089 opened this issue 2 years ago • 4 comments

Because my server side is parsed strictly according to CommonMark specification, so I hope Html parsing to Markdown has the option of parsing according to CommonMark specification.

he<strong>ll</strong>o

output:

he**ll**o

expected output:

he **ll** o

pengqian089 avatar May 09 '22 19:05 pengqian089

Currently there is no explicit setting to adhere to CommonMark spec, will have a look at it.

mysticmind avatar May 10 '22 11:05 mysticmind

with regards to your example he<strong>ll</strong>o being converted to he**ll**o. he**ll**o is correct and he **ll** o is incorrect even if you take into account CommonMark. Am I missing something here?

mysticmind avatar May 10 '22 11:05 mysticmind

with regards to your example he\<strong\>ll\</strong\>o being converted to he**ll**o. he**ll**o is correct and he **ll** o is incorrect even if you take into account CommonMark. Am I missing something here?

Because the language I use is Chinese, which belongs to the Unicode category,so CommonMark requires that emphasis be preceded by a whitespace character.

left-flanking-delimiter-run

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

C# code:

public void HtmlToMarkdown()
{
	var html = "<p><strong>4月19日,特斯拉中国方面发布消息称,</strong>在上海市各级政府部署协调下,4月17日和4月18日,特斯拉8000名员工陆续返厂。其中,工厂电池、电机车间于4月19日早晨恢复生产。“特斯拉会在接下来的3、4天内进行产能逐步爬坡,到整体单班满产。”特斯拉超级工厂生产制造高级总监宋钢表示。</p>";
	var config = new ReverseMarkdown.Config
	{
		UnknownTags = ReverseMarkdown.Config.UnknownTagsOption.PassThrough,
		GithubFlavored = true,
		DefaultCodeBlockLanguage = ""
	};
	var converter = new ReverseMarkdown.Converter(config);
	var markdown = converter.Convert(html);

	Console.WriteLine(markdown);
}

output:

**4月19日,特斯拉中国方面发布消息称,**在上海市各级政府部署协调下,4月17日和4月18日,特斯拉8000名员工陆续返厂。其中,工厂电池、电机车间于4月19日早晨恢复生产。“特斯拉会在接下来的3、4天内进行产能逐步爬坡,到整体单班满产。”特斯拉超级工厂生产制造高级总监宋钢表示。

pengqian089 avatar May 10 '22 13:05 pengqian089

This is going to be a huge set of changes to deal with, will look at how best to address it.

mysticmind avatar May 13 '22 07:05 mysticmind