GreynirEngine icon indicating copy to clipboard operation
GreynirEngine copied to clipboard

Comma sensitivity in sentences

Open jokull opened this issue 5 years ago • 7 comments

I’m not familiar with the parsing pipeline but I thought I would share an instance of where the parser tripped in a (to me) surprising way:

greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn á lóð við Austurhlíð.').lemmas

This is fine and gives me the right lemmas for leiguíbúð, námsmaður etc.

greynir.parse_single('Sótt er um leyfi til að byggja 50 leiguíbúðir fyrir námsme
nn, á lóð við Austurhlíð.').lemmas

The comma before "á lóð" gives me the "eiga" lemma for "á" instead of just "á".

Sorry if GitHub issues is the wrong place. I’m mainly curious about the roadmap, design and limitations. I assume Greynir uses commas to fragment sentences to keep down the parse pathways.

BTW this is a real world example.

Loving Greynir and following your progress! ✨

EDIT: Screenshot might help

Screenshot 2020-05-13 at 22 50 49

jokull avatar May 13 '20 23:05 jokull

Thanks! This is indeed the right place to submit issues such as this one, which describes a bona fide bug. It looks like we need to tweak the grammar for this syntactic structure; it is incorrectly preferring to see ", á lóð" as a continuation of a previous verb phrase, instead of a new prepositional phrase. We will look into this.

vthorsteinsson avatar May 14 '20 12:05 vthorsteinsson

By the way, a good way to visualize the sentence trees is to enter the text into Greynir: https://greynir.is/treegrid?txt=S%C3%B3tt%20er%20um%20leyfi%20til%20a%C3%B0%20byggja%2050%20leigu%C3%ADb%C3%BA%C3%B0ir%20fyrir%20n%C3%A1msmenn%20%C3%A1%20l%C3%B3%C3%B0%20vi%C3%B0%20Austurhl%C3%AD%C3%B0.

vthorsteinsson avatar May 14 '20 12:05 vthorsteinsson

Great!

Here is another potential bug I spotted.

Málinu er vísað til umsagnar skipulagsfulltrúa vegna svala.

"svala" here seems to be the noun not a balcony.

Balcony is perhaps more common, maybe a grammar file tweak can help this. Not even sure what "svali" means.

jokull avatar May 14 '20 14:05 jokull

:)

svali

sveinbjornt avatar May 14 '20 15:05 sveinbjornt

By the way, in the earlier bug ("Sótt var um leyfi...") Greynir is recognizing "Sótt" as the noun, not the verb; and then it constructs a double verb phrase with the verbs "var" and "á" hanging off the subject "Sótt".

vthorsteinsson avatar May 14 '20 15:05 vthorsteinsson

...and "svala" can also be a bird, i.e. a female noun, plus two masculine ones ("svalur" and "svali").

vthorsteinsson avatar May 14 '20 15:05 vthorsteinsson

A fix for "svala" is ready and will be in the next commit to the config file Prefs.conf.

vthorsteinsson avatar May 14 '20 15:05 vthorsteinsson