LemmInflect icon indicating copy to clipboard operation
LemmInflect copied to clipboard

Incorrect inflections of special adjectives like beautiful and handsome

Open OscarWang114 opened this issue 4 years ago • 2 comments

Hi, thanks for building this amazing tool! Currently, it doesn't seem to handle inflections of special adjectives like beautiful and handsome correctly.

Example:

from lemminflect import getLemma, getInflection

lemma = getLemma('beautiful', upos='ADJ')
inflection1 = getInflection(lemma[0], tag='JJR')
inflection2 = getInflection(lemma[0], tag='JJS')
print(inflection1, inflection2)

gives ('beautifuler',) and ('beautifulest',). It'd be great if lemminflect can output something like ('more', 'beautiful',) or ('more beautiful',)!

OscarWang114 avatar Sep 19 '20 12:09 OscarWang114

Thanks for pointing this out.

What's happening is it doesn't have an inflection in its dictionary for JJR/JJS so it's using the out-of-vocabulary rules to create one. You can see this if you do...

lemminflect.Inflections().getAllInflections(lemma[0])
{'JJ': ('beautiful',)}

Essentially, you're asking it to do something that isn't correct for English and it doesn't know that this isn't allowed, or at least isn't going to try to stop you.

I could probably add a rule prevent it from creating an inflection if it has the base lemma but not the specific inflection (or at least log a warning). However, I'm a little concerned that there might be instances where it only has the base form and falling back to the OOV rules for inflection allow things to work correctly for the user.

The right way to do this would be to have a defined list or set of rules for these exceptions and implement a lookup for them. I can look in the base NIH lexicon to see if there's anything that would with that. If you're aware of any resource that details this behavior, let me know. I'll have to look into this some more.

bjascob avatar Sep 19 '20 15:09 bjascob

At least one exception is handled incorrectly:

In [1]: getInflection('little', 'JJR')
Out[1]: ('littler',)  # should be less

In [2]: getInflection('little', 'JJS')
Out[2]: ('littlest',)  # should be least

Some adjectives don't have comparative or superlative forms at all, not even more/most:

In [3]: getInflection('alphanumeric', 'JJR')
Out[3]: ('alphanumericer',)

In [4]: getInflection('alphanumeric', 'JJS')
Out[4]: ('alphanumericest',)

Simple Wiktionary has a list of them: https://simple.wiktionary.org/wiki/Category:Non-comparable_adjectives; not sure whether it's exhaustive.

nihil-admirari avatar May 27 '23 17:05 nihil-admirari