cjkSymbolsAndPunctuation isn't enough for some most used Chinese punctuation
(Nothing to do with CommonMark and environment, appears in any version.)
I'm using the Simple mode of EastAsianLineBreaks. For the following Chinese sentence, comma isn't treated as a full-width rune, which causes unexpected output.
md := goldmark.New(goldmark.WithExtensions(extension.NewCJK(extension.WithEastAsianLineBreaks(extension.EastAsianLineBreaksSimple))))
md.Convert([]byte("第一行。\n第二行。"), os.Stdout)
md.Convert([]byte("第一行,\n第二行。"), os.Stdout)
Output:
<p>第一行。第二行。</p>
<p>第一行,
第二行。</p>
It's expected that the second test should be <p>第一行,第二行。</p>。
I inspected the cjkSymbolsAndPunctuation, turns out that the commonly used Comma(U+FF0C) isn't in that table.
Further, I tested some another commonly used characters in Chinese, they are also not in that table.
chinesePunctuation := `!;?,。`
for _, r := range chinesePunctuation {
fmt.Println(string(r), "\t", util.IsEastAsianWideRune(r))
}
Output:
! false
; false
? false
, false
。 true
I find it hard to say what exact Unicode range these punctuation are in, but at least the punctuation listed above should be included.
Update:
Chinese Comma in the following table instead:
var halfwidthAndFullwidthForms = &unicode.RangeTable{
R16: []unicode.Range16{
{0xFF00, 0xFFEF, 1},
},
}
Thanks for reporting this. I do not familiar with Chinese, but adding
var halfwidthAndFullwidthForms = &unicode.RangeTable{
R16: []unicode.Range16{
{0xFF00, 0xFFEF, 1},
},
}
looks good to me just look a code. Could you create a PR( w/ tests)?.
PR submitted.
The current patch fixes only some use cases, with some standard Chinese usage of punctuation unfixed. For example, there shouldn't be spaces between Chinese punctuation and English letters. They're not covered by this fix.