UD_Portuguese-Bosque icon indicating copy to clipboard operation
UD_Portuguese-Bosque copied to clipboard

mwes that contain contractions

Open vcvpaiva opened this issue 7 years ago • 9 comments

Bug reported by Martin Popel in email of April 22nd, 2017.

In UD_Portuguese we have more than one occurrence of no-split contractions. Martin said:

"ao" can be preposition "ao" or preposition+determiner "a o". Am I right? vcvp: no. as far as I know, there is no preposition "ao", the preposition is "a".

OK, so it was probably an annotation error what I saw in the released UD_Portuguese:

# sent_id = CF907-1
# text = Ao contrário, nesta situação econômica de extrema gravidade, todos gostariam de ajudar o país.
─┮
 │   ╭─╼ Ao ao ADP PRP|@ADVL> _ case MWE=Ao_contrário
 │ ╭─┾ contrário contrário NOUN N|M|S|@P< Gender=Masc|Number=Sing obl SpaceAfter=No

this is the kind of mistake we would like to find.

Martin also said:

A better example of ambiguity is "do" which can be either ADP+DET or ADP+PRON (always "de o") according to UD_Portuguese. E.g. in sent_id = CF904-2. "Aumento no número de passes e lançamentos aproxima futebol brasileiro do praticado no Mundial dos EUA" do praticado, do=ADP+PRON vcvp: this seems correct...

vcvpaiva avatar Apr 26 '17 14:04 vcvpaiva