humanize icon indicating copy to clipboard operation
humanize copied to clipboard

`intword` uses English units regardless of the active localization

Open TellowKrinkle opened this issue 1 year ago • 2 comments

Not all languages group large numbers into A103 + B106, etc. For example, Japanese has no word for 106. Instead, it has a word for 104, and 106 is written as 100 of those. Unsurprisingly, 2e7 is not written 20 × 100 × 104, but as 2000 × 104. Translations should be able to specify which powers of 10 have special names.

What did you do?

>>> import humanize
>>> humanize.i18n.activate("ja_JP")
<gettext.GNUTranslations object at 0x102fcca00>
>>> humanize.intword(234909023)
'234.9 百万'
>>> humanize.intword(2349090)
'2.3 百万'

What did you expect to happen?

>>> humanize.intword(234909023)
'2.3億'
>>> humanize.intword(2349090)
'234.9万'

What actually happened?

>>> humanize.intword(234909023)
'234.9 百万'
>>> humanize.intword(2349090)
'2.3 百万'

(This is the equivalent of putting in 23490902 and getting "234.9 hundred thousand" in English)

What versions are you using?

  • OS: Fedora 39
  • Python: 3.12
  • Humanize: 4.9.0

TellowKrinkle avatar Mar 12 '24 20:03 TellowKrinkle

Thanks for the report. I'm not sure how well suited this library is to adapt for this, but would review a PR if you'd like to look into it.

hugovk avatar Mar 12 '24 21:03 hugovk

Another point: in French, values under 2 millions (or other units) should not be pluralized. Examples: 1 million, 1.1 million, 1.7 milliard.

I thought about kludging something using a translation* but this would not be a generic solution to also solve this bug. So I think both the string values and the number defining brackets would need to be part of the translation files.

* (like plural_threshold = int(pgettext("intword plural threshold", "1")))

merwok avatar Apr 01 '25 21:04 merwok