Humanizer icon indicating copy to clipboard operation
Humanizer copied to clipboard

Localize Pluralize/Singularize (WAS: Localizable InflectorExtensions)

Open kblok opened this issue 11 years ago • 14 comments

I'd like to implement an Spanish implementation for the InflectorExtensions. I don't know if there is an ongoing work on this topic (the issue #132 is quite related to this)

I think we should have a culture specific provider responsible of filling the rules list and then simply (?) write regex rules for each language.

What do you think @MehdiK ?

kblok avatar Apr 12 '14 16:04 kblok

I think by inflector methods you only mean Pluralize and Singularize here, right?

This is a great idea. We can extract the localizable logic out of the class, implement a default pluralizer/singularizer/inflector class with the current logic (excluding the rules) and provide hooks for injecting the rules etc; kinda like how NumberToWordsConverter is implemented.

What does the localisation committee think? /cc @harouny, @JonasJensen, @mexx, @mnowacki, @hazzik, @thunsaker, @henriksen, @ekblom, @akamud, @ignorkulman, @Borzoo, @onovotny

MehdiK avatar Apr 12 '14 18:04 MehdiK

Yes, I'm talking about the Pluralize and Singularize feature. This is what I have in mind:

  • InflectorExtensions should only have extension methods logic
  • An IInflector which could have the same API as the extension methods
  • As you say we could have a DefaultInflactor with the current behavior but with no explicit rules
  • New classes EnglishInflector and SpanishInflector which could inherit from DefaultInflector and with the responsibility of filling the Plurals, Singulars and Uncountable lists

I think that the Spanish language has similar rules than the english language regarding singularity and plurality (It has uncountable, singular only, plural only and irregular words) so the behavior could be the same. Another language could choose between inherits from DefaultInflector or just implementing the IInflector interface.

I have my doubts if IInflector, DefaultInflector, EnglishInflector, etc should have some sufix (Provider? Engine?)

kblok avatar Apr 12 '14 19:04 kblok

Before going too far, I'd like to confirm that it is actually possible to implement this logic in other languages too, either through changing the rules or implementation from scratch. Depending on the complexities of other languages we may have to choose a different design or think harder about this. Sometimes language rules get way too complex (#64)! I have considered creating a new Humanizer.Dictionary package that deals with this and other language specific word manipulations, and I still think that's a viable solution.

FWIW the English implementation is relatively buggy too. See #142 for more details.

MehdiK avatar Apr 12 '14 19:04 MehdiK

After looking at the InflectorExtension implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex. The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etymology or word's accent. Making it impossible to predict what the correct plural form would be.

For example, there's a rule that says that words that end with "ão" will have "ões" in its plural form:

coração -> corações
cordão -> cordões

But there are some words that don't follow that rule:

órgão -> órgãos
alemão -> alemães
cão - cães

In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):

mão -> mãos
artesão -> artesãos

To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.

akamud avatar Apr 15 '14 01:04 akamud

Spanish rules are similar, I tried to explain some of these with regard to the ordinals #212

On Mon, Apr 14, 2014 at 6:29 PM, Mahmoud Ali [email protected] wrote:

After looking at the InflectorExtension implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex. The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etimology or word's accent. Making it impossible to predict what the correct plural form would be. For example, there's a rule that says that words that ends with "ão" will have "ões" in its plural form:

coração -> corações
cordão -> cordões

But there are some words that don't follow that rule:

órgão -> órgãos
alemão -> alemães
cão - cães

In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):

mão -> mãos
artesão -> artesãos

To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.

Reply to this email directly or view it on GitHub: https://github.com/MehdiK/Humanizer/issues/197#issuecomment-40436798

thunsaker avatar Apr 15 '14 01:04 thunsaker

My concern with dictionaries is the impact they could have in terms of the "weight" of the library (I think it could be solved with resources) and performance (I should also be worried about the performance with so many regex the lib is evaluating right now).

Another think with dictionary is maintenance, where will we easily get a list of singular and plurals? I don't know if it easy to get, at least for the Spanish language.

kblok avatar Apr 15 '14 02:04 kblok

BTW @thunsaker I have this link with rules for plurals (spanish) http://es.m.wikibooks.org/wiki/Espa%C3%B1ol/Morfolog%C3%ADa/Sustantivo

kblok avatar Apr 15 '14 02:04 kblok

For Russian there is an extra grammatical number present. In the current implementation it is named Paucal, actually it is a kind of Dual. For now I have no elegant solution to support this distinction in the Inflector scenario.

In German it would be possible to go with the injection of the rules, as German as English also have only two grammatical numbers.

mexx avatar Apr 16 '14 21:04 mexx

@mexx, paucal is usually not a number, but a genitive case in Russian.

hazzik avatar Apr 16 '14 22:04 hazzik

I think we need to properly implement GrammaticalNumberDetector for all languages and widely use it.

hazzik avatar Apr 16 '14 22:04 hazzik

I'm thinking about interface IQuantifiable { ToQuantity(int number); } or IWord, which can implement language specific logic of quantification. What do you think? The concept similar to this was already used in DutchNumberToWordsConverter

hazzik avatar Jun 28 '14 14:06 hazzik

@hazzik, this idea was implemented in #285 but we need a better design to convert singulars to plurals and duals and vice versa. I'm trying to come up with an elegant solution that supports singulars, duals, paucals(if needed) and plurals.

Borzoo avatar Aug 08 '14 13:08 Borzoo

@Borzoo, the thing implemented in #285 is something different. There is IQuantifier, which can quantify any word, but I propose that word itself can have different representations.

hazzik avatar Aug 11 '14 04:08 hazzik

Has any progress been made on this issue?

5cover avatar May 11 '23 06:05 5cover