GreynirEngine icon indicating copy to clipboard operation
GreynirEngine copied to clipboard

Comma sensitivity in sentences

Open jokull opened this issue 4 years ago • 7 comments

I’m not familiar with the parsing pipeline but I thought I would share an instance of where the parser tripped in a (to me) surprising way:

greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn á lóð við Austurhlíð.').lemmas

This is fine and gives me the right lemmas for leiguíbúð, námsmaður etc.

greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn, á lóð við Austurhlíð.').lemmas

The comma before "á lóð" gives me the "eiga" lemma for "á" instead of just "á".

Sorry if GitHub issues is the wrong place. I’m mainly curious about the roadmap, design and limitations. I assume Greynir uses commas to fragment sentences to keep down the parse pathways.

BTW this is a real world example.

Loving Greynir and following your progress! ✨

EDIT: Screenshot might help

Screenshot 2020-05-13 at 22 50 49

jokull avatar May 13 '20 23:05 jokull