dragonmapper icon indicating copy to clipboard operation
dragonmapper copied to clipboard

hanzi.to_pinyin delimiter is ignored

Open glowinthedark opened this issue 4 years ago • 2 comments

Summary

The delimiter parameter to to_pinyin() has no effect

Example:

hanzi.to_pinyin("我猕猴桃过敏。", delimiter='.')
# ACTUAL OUTPUT:
#     'wǒmíhóutáoguòmǐn。'

# EXPECTED OUTPUT:
#     'wǒ.míhóutáo.guòmǐn。'

The default delimiter of empty string ' ' is not applied either:

hanzi.to_pinyin("我猕猴桃过敏。"')
# ACTUAL OUTPUT:
#     'wǒmíhóutáoguòmǐn。'

# EXPECTED OUTPUT:
#     'wǒ míhóutáo guòmǐn。'

glowinthedark avatar Aug 01 '20 12:08 glowinthedark

Hello @glowinthedark! So, hanzi.to_pinyin()'s delimiter argument is referring to the Chinese character source string. It's used to partition the string by words rather than characters, allowing for a more accurate Pinyin reading.

delimiter is the character used to indicate word boundaries in s. This is used to differentiate between words and characters so that a more accurate reading can be returned.

Being able to format the output of the function makes sense though 👍

tsroten avatar Aug 01 '20 13:08 tsroten

@tsroten: Understood. Thanks for clarifying. But then delimiter is really misleading because in other libraries delimiter or separator are used to signify the string to use as a delimiter in the generated output, and not as a hint about the format of the input string. The semantics would rather fit the description of input_delimiter or source_delimiter rather than delimiter.

How to generate then something like 'wǒ míhóutáo guòmǐn。'?

glowinthedark avatar Aug 01 '20 14:08 glowinthedark