GreynirEngine
GreynirEngine copied to clipboard
Comma sensitivity in sentences
I’m not familiar with the parsing pipeline but I thought I would share an instance of where the parser tripped in a (to me) surprising way:
greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn á lóð við Austurhlíð.').lemmas
This is fine and gives me the right lemmas for leiguíbúð, námsmaður etc.
greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn, á lóð við Austurhlíð.').lemmas
The comma before "á lóð" gives me the "eiga" lemma for "á" instead of just "á".
Sorry if GitHub issues is the wrong place. I’m mainly curious about the roadmap, design and limitations. I assume Greynir uses commas to fragment sentences to keep down the parse pathways.
BTW this is a real world example.
Loving Greynir and following your progress! ✨
EDIT: Screenshot might help
Thanks! This is indeed the right place to submit issues such as this one, which describes a bona fide bug. It looks like we need to tweak the grammar for this syntactic structure; it is incorrectly preferring to see ", á lóð" as a continuation of a previous verb phrase, instead of a new prepositional phrase. We will look into this.
By the way, a good way to visualize the sentence trees is to enter the text into Greynir: https://greynir.is/treegrid?txt=S%C3%B3tt%20er%20um%20leyfi%20til%20a%C3%B0%20byggja%2050%20leigu%C3%ADb%C3%BA%C3%B0ir%20fyrir%20n%C3%A1msmenn%20%C3%A1%20l%C3%B3%C3%B0%20vi%C3%B0%20Austurhl%C3%AD%C3%B0.
Great!
Here is another potential bug I spotted.
Málinu er vísað til umsagnar skipulagsfulltrúa vegna svala.
"svala" here seems to be the noun not a balcony.
Balcony is perhaps more common, maybe a grammar file tweak can help this. Not even sure what "svali" means.
:)

By the way, in the earlier bug ("Sótt var um leyfi...") Greynir is recognizing "Sótt" as the noun, not the verb; and then it constructs a double verb phrase with the verbs "var" and "á" hanging off the subject "Sótt".
...and "svala" can also be a bird, i.e. a female noun, plus two masculine ones ("svalur" and "svali").
A fix for "svala" is ready and will be in the next commit to the config file Prefs.conf.