goldmark icon indicating copy to clipboard operation
goldmark copied to clipboard

cjkSymbolsAndPunctuation isn't enough for some most used Chinese punctuation

Open movsb opened this issue 2 months ago • 2 comments

(Nothing to do with CommonMark and environment, appears in any version.)

I'm using the Simple mode of EastAsianLineBreaks. For the following Chinese sentence, comma isn't treated as a full-width rune, which causes unexpected output.

	md := goldmark.New(goldmark.WithExtensions(extension.NewCJK(extension.WithEastAsianLineBreaks(extension.EastAsianLineBreaksSimple))))

	md.Convert([]byte("第一行。\n第二行。"), os.Stdout)
	md.Convert([]byte("第一行,\n第二行。"), os.Stdout)

Output:

<p>第一行。第二行。</p>
<p>第一行,
第二行。</p>

It's expected that the second test should be <p>第一行,第二行。</p>

I inspected the cjkSymbolsAndPunctuation, turns out that the commonly used Comma(U+FF0C) isn't in that table.

Further, I tested some another commonly used characters in Chinese, they are also not in that table.

	chinesePunctuation := `!;?,。`
	for _, r := range chinesePunctuation {
		fmt.Println(string(r), "\t", util.IsEastAsianWideRune(r))
	}

Output:

!       false
;       false
?       false
,       false
。       true

I find it hard to say what exact Unicode range these punctuation are in, but at least the punctuation listed above should be included.

Update:

Chinese Comma in the following table instead:

var halfwidthAndFullwidthForms = &unicode.RangeTable{
	R16: []unicode.Range16{
		{0xFF00, 0xFFEF, 1},
	},
}

movsb avatar Nov 13 '25 17:11 movsb

Thanks for reporting this. I do not familiar with Chinese, but adding

var halfwidthAndFullwidthForms = &unicode.RangeTable{
	R16: []unicode.Range16{
		{0xFF00, 0xFFEF, 1},
	},
}

looks good to me just look a code. Could you create a PR( w/ tests)?.

yuin avatar Nov 17 '25 17:11 yuin

PR submitted.

The current patch fixes only some use cases, with some standard Chinese usage of punctuation unfixed. For example, there shouldn't be spaces between Chinese punctuation and English letters. They're not covered by this fix.

movsb avatar Nov 19 '25 15:11 movsb