dragonmapper
dragonmapper copied to clipboard
hanzi.to_pinyin delimiter is ignored
Summary
The delimiter parameter to to_pinyin()
has no effect
Example:
hanzi.to_pinyin("我猕猴桃过敏。", delimiter='.')
# ACTUAL OUTPUT:
# 'wǒmíhóutáoguòmǐn。'
# EXPECTED OUTPUT:
# 'wǒ.míhóutáo.guòmǐn。'
The default delimiter of empty string ' '
is not applied either:
hanzi.to_pinyin("我猕猴桃过敏。"')
# ACTUAL OUTPUT:
# 'wǒmíhóutáoguòmǐn。'
# EXPECTED OUTPUT:
# 'wǒ míhóutáo guòmǐn。'
Hello @glowinthedark! So, hanzi.to_pinyin()
's delimiter
argument is referring to the Chinese character source string. It's used to partition the string by words rather than characters, allowing for a more accurate Pinyin reading.
delimiter is the character used to indicate word boundaries in s. This is used to differentiate between words and characters so that a more accurate reading can be returned.
Being able to format the output of the function makes sense though 👍
@tsroten: Understood. Thanks for clarifying. But then delimiter
is really misleading because in other libraries delimiter
or separator
are used to signify the string to use as a delimiter in the generated output, and not as a hint about the format of the input string. The semantics would rather fit the description of input_delimiter
or source_delimiter
rather than delimiter
.
How to generate then something like 'wǒ míhóutáo guòmǐn。'
?