GreynirEngine
GreynirEngine copied to clipboard
Comma sensitivity in sentences
I’m not familiar with the parsing pipeline but I thought I would share an instance of where the parser tripped in a (to me) surprising way:
greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn á lóð við Austurhlíð.').lemmas
This is fine and gives me the right lemmas for leiguíbúð, námsmaður etc.
greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn, á lóð við Austurhlíð.').lemmas
The comma before "á lóð" gives me the "eiga" lemma for "á" instead of just "á".
Sorry if GitHub issues is the wrong place. I’m mainly curious about the roadmap, design and limitations. I assume Greynir uses commas to fragment sentences to keep down the parse pathways.
BTW this is a real world example.
Loving Greynir and following your progress! ✨
EDIT: Screenshot might help