pattern icon indicating copy to clipboard operation
pattern copied to clipboard

[BUG] The Singularize function is extremely bad TBH

Open tim5go opened this issue 6 years ago • 3 comments
trafficstars

The built-in singularize function yields lots of false positives:

Here're some examples:

  1. business
  2. virginia
  3. tour
  4. loss

It ends up I need to define a self-maintained exception dictionary, which is really inconvenient. I know it's hard to cover all cases, but some of the false positives are really trivial. I am quite disappointed given this repo receives lots of stars.

tim5go avatar Dec 24 '18 07:12 tim5go

Interesting. Those are indeed problematic. I've generally been happy with the vast majority of "singularized" words, but I'll add a few that were problems for me:

cross->cros goddess->goddes sadness->sadnes sarcophagus->sarcophagu putti->puttus (should be putto) world war ii->world war ius

Also adding the errors as described above:

business->business virginia->virginium tour->tmy loss->los

fuzheado avatar Jan 04 '19 16:01 fuzheado

Hi, I'm a new contributor to this repo and I'd like to try my hand at solving this.

The documentation states that pattern.text.en.inflect's singularize function- which seems to be the problem here- has been adapted from this repo: https://github.com/bermi/Python-Inflector. Was it directly taken, or were there changes made? I'm wondering if I should wander over to the inflector to see how the singularize algorithm works, or just look at the one here.

AdLucem avatar Feb 15 '19 18:02 AdLucem

Also I get viruses->viruse - Is there an updated model file or something? Is it solved in 3.6?

ndvbd avatar Apr 22 '19 14:04 ndvbd