GreynirEngine icon indicating copy to clipboard operation
GreynirEngine copied to clipboard

Common names interpreted as verbs

Open jokull opened this issue 5 years ago • 5 comments

I ran the 100 most common first names in Iceland through greynir.parse. No female names are interpreted as verbs but there are a few male ones. See this gist for the code.

https://gist.github.com/jokull/2c1048bbc845feb46c717ac7c77e0cc5

  • Einar → eina so
  • Árni → árna so
  • Helgi → helga so
  • Ragnar → ragna so
  • Óskar → óska so
  • Birgir → birgja so
  • Brynjar → brynja so
  • Rúnar → rúna so
  • Ómar → óma so
  • Reynir → reyna so
  • Garðar → garða so
  • Steinar → steina so

jokull avatar May 26 '20 09:05 jokull

Ef ég geri greynir.parse_single(f'Forstjórinn heitir {name}') þá hjálpar það Greyni nægilega mikið til að átta sig á að um persónu er að ræða. Nema í einu tilfelli, fyrir nafnið Örn, þar verður "Örn." með punkti að terminal:

>>> sentence = greynir.parse_single('Forstjórinn heitir Örn.')
>>> sentence.terminals
[Terminal(text='Forstjórinn', lemma='forstjóri', category='no', variants=['et', 'gr', 'kk', 'nf'], index=0), Terminal(text='heitir', lemma='heita', category='so', variants=['1', 'nf
', 'et', 'fh', 'gm', 'nt', 'p3'], index=1), Terminal(text='Örn.', lemma='Örn.', category='no', variants=['et', 'hk', 'nf'], index=2)]

jokull avatar May 26 '20 09:05 jokull

This is not a surprise, really, as Greynir has a preference for recognizing sentences (with verbs) rather than noun phrases, if both are possible. But for this use case, I would recommend using parse_noun_phrase() instead of parse_single() - that would always give preference to names instead of verbs. Would this solve your problem?

vthorsteinsson avatar May 26 '20 15:05 vthorsteinsson

Having said that, the Örn. case is clearly a bug ;-)

vthorsteinsson avatar May 26 '20 15:05 vthorsteinsson

Could this be related to "örn." being an abbreviation recognised by the tokenizer?

sveinbjornt avatar May 26 '20 18:05 sveinbjornt

I'm parsing search queries. I've resorted to just searching both the lemma and the original query.

jokull avatar May 26 '20 23:05 jokull