obsidian-linter
obsidian-linter copied to clipboard
FR: Remove Space Prior to English Punctuation and Full Width Characters
在践行中文文案排版指北的时候容易多打空格,请问能否用Linter删除以下示例中中文标点前后的空格呢?
Before:
english , english。
Expected:
english,english。
谢谢
Edit:
Here is the English translation according to Google Translate.
When practicing Chinese copywriting, it is easy to make more spaces. Can I use Linter to delete the spaces before and after the Chinese punctuation in the following example?
Before:
english , english.
Expected:
english, english.
thanks
Currently there is no rule that removes spaces before or after punctuation. There is rule to remove multiple spaces and convert them into 1 space as well as a rule for adding a space between Chinese and English characters: remove-multiple-spaces and space-between-chinese-and-english-or-numbers.
Would you like to request the addition of a rule to remove spaces around English punctuation (I am not familiar with punctuation in other languages, so contributions or an explanation of how punctuation works in other languages would be very much appreciated)?
Chinese translation according to Google Translate:
目前没有删除标点前后空格的规则。 有删除多个空格并转换为1个空格的规则以及在中英文字符之间添加空格的规则:[remove-multiple-spaces](https://github.com/platers/obsidian-linter/ blob/master/docs/rules.md#remove-multiple-spaces) 和 [space-between-chinese-and-english-or-numbers](https://github.com/platers/obsidian-linter/blob/master /docs/rules.md#space-between-chinese-and-english-or-numbers)。
您是否想请求添加一条规则以删除英语标点符号周围的空格(我不熟悉其他语言的标点符号,因此非常感谢您提供或解释标点符号在其他语言中的工作原理)? 对不起,如果中文不好,因为我使用谷歌翻译来回答和翻译你的问题。
Hi Peter, so sorry that I thought you could speak Chinese.
Yes I would like to request the addition of a rule to remove spaces around (before AND after) Chinese punctuation. Chinese punctuation marks (technically called fullwidth forms) are bigger than English punctuation (technically called halfwidth forms), so for example 。,()“”:;
are the Chinese version (fullwidth forms) of .,()"":;
.
It is best practice to add a space between Chinese and English or number characters (for example, space-between-chinese-and-english-or-numbers correctly turns 中文English中文
into 中文 English 中文
), but Chinese punctuation should not be surrounded by any spaces (for example, space-between-chinese-and-english-or-numbers correctly does not turn English,English
(correct) into English , English
(incorrect)). space-between-chinese-and-english-or-numbers correctly does not add spaces around Chinese punctuation, but it would be nice to add a feature to remove spaces around Chinese punctuation, to turn English , English
(incorrect) into English,English
(correct).
Could you please help to add a rule to remove spaces around (before AND after) Chinese punctuation? You can refer to space-between-chinese-and-english-or-numbers to see how "Chinese punctuation" translates to coding language.
Btw, it would also be nice to add a rule to remove the space before an English punctuation mark, so that Linter can turn English , English
into English, English
.
Thank you so much!
No problem. Thank you for letting us know that there is a difference as I was unaware. It seems that adding the full width forms into regex to remove whitespace around it should be simple. As for doing that for English or halfwidth forms, I would have to think a little more on it since it is not as simple as just removing whitespace as it depends on the punctuation mark in question.
@user30535 , I have created a PR for the issue with fullwidth punctuation: https://github.com/platers/obsidian-linter/pull/260. It covers those listed in your above comment. Does the example here look right to you?
It just brought to my attention that “
and ”
(albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake!
In addition, here is a complete list of fullwidth punctuation/characters:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉
\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009
Could you please add them all to the rule? Thank you so much!
I can definitely see about adding them and removing those two quotes.
Good to know that! Thanks a lot!
I have gone ahead and merged the changes with the unicode characters mentioned above. If there are unexpected changes to spacing or any missing characters, feel free to let me know. The changes should be on master and slated for the next release.
Sorry, I closed this in relation to the Chines character PR, but realized there was still the mention of the English punctuation.
It just brought to my attention that
“
and”
(albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake!In addition, here is a complete list of fullwidth punctuation/characters:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉
\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009
Could you please add them all to the rule? Thank you so much!
@user30535 can you please explain where did you get your list from?
According to the links below, you included in your list some characters that not considered to be Fullwidth according to the Unicode spec, such as \u2013 (–)
https://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)
http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html
It just brought to my attention that
“
and”
(albeit commonly used in Chinese) are actually halfwidth punctuation, and there is no fullwidth version of them. Please remove them from the rule. Sorry for the mistake! In addition, here is a complete list of fullwidth punctuation/characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥。、「」『』〔〕【】—…–《》〈〉 \uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19\uff21\uff22\uff23\uff24\uff25\uff26\uff27\uff28\uff29\uff2a\uff2b\uff2c\uff2d\uff2e\uff2f\uff30\uff31\uff32\uff33\uff34\uff35\uff36\uff37\uff38\uff39\uff3a\uff41\uff42\uff43\uff44\uff45\uff46\uff47\uff48\uff49\uff4a\uff4b\uff4c\uff4d\uff4e\uff4f\uff50\uff51\uff52\uff53\uff54\uff55\uff56\uff57\uff58\uff59\uff5a\uff0c\uff0e\uff1a\uff1b\uff01\uff1f\uff02\uff07\uff40\uff3e\uff5e\uffe3\uff3f\uff06\uff20\uff03\uff05\uff0b\uff0d\uff0a\uff1d\uff1c\uff1e\uff08\uff09\uff3b\uff3d\uff5b\uff5d\uff5f\uff60\uff5c\uffe4\uff0f\uff3c\uffe2\uff04\uffe1\uffe0\uffe6\uffe5\u3002\u3001\u300c\u300d\u300e\u300f\u3014\u3015\u3010\u3011\u2014\u2026\u2013\u300a\u300b\u3008\u3009 Could you please add them all to the rule? Thank you so much!@user30535 can you please explain where did you get your list from?
According to the links below, you included in your list some characters that not considered to be Fullwidth according to the Unicode spec, such as
\u2013 (–)
https://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)
http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.:;!?"'`^~ ̄_&@#%+-*=<>()[]{}⦅⦆|¦/\¬$£¢₩¥ are copied from http://xahlee.info/comp/unicode_full-width_chars.html
I considered the list above incomplete, so I manually added 。、「」『』〔〕【】—…–《》〈〉 to the list. I agree that —…– may be interpreted not as fullwidth, but the others are pretty standard Chinese characters that should always be interpreted as fullwidth characters.
I think Fullwidth
name in the linter rule is misleading. The characters you are referring to are CJK Symbols and Punctuation
I think we should think about renaming the rule to clarify the intent
I think
Fullwidth
name in the linter rule is misleading. The characters you are referring to are CJK Symbols and PunctuationI think we should think about renaming the rule to clarify the intent
Could you explain how these are not Fullwidth Characters? I don't believe I fully follow the discussion (especially since I am just an English speaker and rarely deal with Fullwidth characters).
@pjkaufman here is the definition of Fullwidth Unicode symbols https://en.m.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)
The list @user30535 provided contains symbols that don't belong to that list. That's why I suggested to make the rule naming better aligned with the Unicode spec
Sorry about the delay. The rule for removing space before and or after certain characters has now been added to master and should go out in the next release. I have turned it on for my own vault and ironed out a few kinks, so hopefully several of the edge cases have been covered. Please let us know if there are any issues in the next release.