languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

[pt] Change lemma for ordinal numbers

Open susanaboatto opened this issue 2 years ago • 6 comments

Hi there!

One of the problems with GENERAL_GENDER_AGREEMENT_ERRORS is that ordinal numbers don't all have the same lemma for their tags. For 1o or to be changed into 1a or they need to have the same lemma. But I don't know how to do this for all the numbers. Ideally, [number]a or [number]ª would have [number]º as a lemma. This was my attempt, but of course it didn't work, because the lemma needs to match the number (so I reverted it). Can anybody help? @marcoagpinto @danielnaber

If this isn't possible, I will add another rule in GENERAL_GENDER_AGREEMENT_ERRORS to fix it. The GENERAL_NUMBERS_AGREEMENT_ERRORS rule might need one too, @marcoagpinto.

susanaboatto avatar Aug 26 '22 13:08 susanaboatto

Do you even need POS tags for that? (\d+)º would match any number followed by º and can then be replaced by a suggestion \1ª I think. See for examples this German rule (doesn't use numbers, but that shouldn't matter):

            <rule>
                <regexp>\b(ungefähr) (genau(so)?)\b</regexp>
                <message>&unstimmig; Bitte wählen Sie zwischen <suggestion>\1</suggestion> und <suggestion>\2</suggestion>.</message>
                <example correction="ungefähr|genauso">Die sehen <marker>ungefähr genauso</marker> aus.</example>
                <example correction="ungefähr|genau">Diese Wörter haben <marker>ungefähr genau</marker> dieselbe Bedeutung.</example>
                <example>Diese Wörter haben dieselbe Bedeutung.</example>
            </rule>

danielnaber avatar Aug 26 '22 14:08 danielnaber

You are right—that was the solution I had in mind, but I was just wondering if maybe there was a shortcut so that the same mistake would also be quickly fixed in other rules like NUMBERS_AGREEMENT.

@marcoagpinto since you're taking care of NUMBERS_AGREEMENT, would you mind checking if there are problems with the suggestions for ordinals there too? 😅 I think this is practically the same problem:

image

susanaboatto avatar Aug 26 '22 14:08 susanaboatto

@susanaboatto @danielnaber

I agree with Susana, if we could have extra power and simplicity using the disambiguator, we should use it.

Unfortunately, I don't have a clue on how to code it, I usually ask for the help of @jaumeortola

A year or so ago I tried to implement something in it and I failed and it was a bad experience for me.

marcoagpinto avatar Aug 26 '22 14:08 marcoagpinto

It is just like the “SPS00” which now has extra information.

Before Jaume improved it, it was extremely hard to code rules, which is one of the reasons I have or want to revise all my past code.

We now have extra power and simplicity.

marcoagpinto avatar Aug 26 '22 14:08 marcoagpinto

@danielnaber @jaumeortola @susanaboatto @ricardojosehlima

O algoritmo é muito bom, precisão 100%.

It triggers a gender error.

One more reason for ordinals in the disambiguator.

When does Jaume return from holidays?

marcoagpinto avatar Aug 28 '22 10:08 marcoagpinto

Ideally, [number]a or [number]ª would have [number]º as a lemma.

@susanaboatto We can add the POS tags to added.txt for a limited range of usual numbers (1-20 or 1-100). If you really need this for all numbers, we would have to make special changes in the tagger and the synthesizer (in Java). I haven't done it in other languages because high ordinal numbers are unusual in practice. The ordinals are all tagged in disambiguation.xml, but there are no suggestions in case of agreement errors.

jaumeortola avatar Sep 21 '22 21:09 jaumeortola

@jaumeortola Thanks for your input, that makes sense. I will add ordinals to added.txt within a limited range, then.

susanaboatto avatar Sep 22 '22 07:09 susanaboatto