languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

[pt] rule messages in different varieties (BR/PT)

Open jaumeortola opened this issue 2 years ago • 15 comments

@marcoagpinto suggests doing this (not yet implemented). (See https://github.com/languagetool-org/languagetool/issues/6301)

<message>Se for um texto académico/científico, pondere empregar o termo 'imprecisão'.</message>
<message locale="pt-br">Se for um texto acadêmico/científico, pondere empregar o termo 'imprecisão'.</message>

Other options:

  • Using some LanguageTool rules to adapt the texts. This could be enough if the changes are just académico<>acadêmico, and similar ones.
  • Removing words that are not common. (It is not a general solution.) @susanaboatto

jaumeortola avatar Jul 26 '22 08:07 jaumeortola

The solution suggested by Marco is good only for rule messages, but not if we want to adjust other messages (e.g. rule names).

It should not be difficult to create our own automatic BR/PT translator with a few rules, at least for the language used in rule messages. What do you think, @susanaboatto? Is it feasible? In what direction (BR>PT or PT>BR) can it be easier to do? The current messages are in pt-PT. We could keep it that way.

I would only need a list of LanguageTool rules to apply.

jaumeortola avatar Jul 26 '22 16:07 jaumeortola

@jaumeortola

Right now, I can only think of the word “académico” → “acadêmico”.

      <message>Em certos contextos, esta perífrase pode ser simplificada.</message>
	  <message>Enriqueça a linguagem para causar mais impacto ao leitor.</message>
	  <message>Esta perífrase pode ser simplificada.</message>
	  <message>Esta perífrase poderá ser simplificada.</message>
	  <message>Expressão vulgar, pondere empregar:</message>
	  <message>Possível confusão de termos.</message>
	  <message>Se for um texto académico, pondere melhorar a linguagem.</message>
	  <message>Se for um texto académico/científico, pondere melhorar a linguagem.</message>
	  <message>Se for um texto académico/científico, pondere empregar o termo 'imprecisão'.</message>
	  <message>Se for um texto académico/científico, pondere empregar o termo 'exato'.</message>
	  <message>Se for uma tese de doutoramento, verifique se o 'tom' de redação é o apropriado.</message>
	  <message>Se estiver a referir-se a fármacos ou afins, empregue o termo 'embalagem'.</message>
	  <message>Se estiver a referir-se a fármacos ou afins, empregue o termo 'tomar'.</message>

But it would be great if the language being used could use the replace.txt file for automatic conversion. It would be extremely good.

marcoagpinto avatar Jul 26 '22 16:07 marcoagpinto

@jaumeortola I think this would be useful! I would say PT>BR should be easier to do, since we already have most messages in pt-PT. The verb conjugation differences are also mostly just a matter of one variant using an accent mark while the other doesn't. There are other differences, but those are easily solved moving rules to the PT grammar (and vice-versa) and adapting them.

susanaboatto avatar Jul 26 '22 16:07 susanaboatto

Yes, basically is just the accent in words, that is why I think replace.txt could be used.

But it will give a ton of work to the person who is going to code it… will it be you, Jaume?

❤️

marcoagpinto avatar Jul 26 '22 16:07 marcoagpinto

using the replace.txt will also let us find out words which we haven't added to that file yet, which is positive.

marcoagpinto avatar Jul 26 '22 16:07 marcoagpinto

@marcoagpinto Verbs conjugated in the 3rd person of the Past Perfect seem to include an accent mark in PT-PT, right? I was looking at the TESE_PHD_PROCURAR_PROVAR_PROVARA and noticed "provámos" as a suggestion. We do not use this spelling in BR, for example.

Also, I have been meaning to ask—how strict is the 2009 orthographic agreement in Portugal? In Brazil, it is strongly preferred. So, no "c"s in words like "objectivo", for instance, weekdays and months are written in lower case, etc.

susanaboatto avatar Jul 26 '22 16:07 susanaboatto

@susanaboatto

We use the agreement here, no “c” in words like “objectivo”, but there are exceptions, like: "facto”, “contacto”, etc.

And weekdays and months are written in lowercase, and so are language names.

However, there are some people that still use the old writing, like my mother, so I haven't installed LanguageTool in her laptop (I had installed, but she was always complaining, so I removed it).

❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️

marcoagpinto avatar Jul 26 '22 16:07 marcoagpinto

@susanaboatto

Ahhh… sorry… “provamos” e “provámos” have different meanings here, the first one is present, and the second one is past (the accent makes all the difference), however we see ordinary people using the first in cases where they should use the second, which is wrong in Portugal.

marcoagpinto avatar Jul 26 '22 17:07 marcoagpinto

@jaumeortola a similar approach could also be helpful for <suggestion>'s e.g., when we want to suggest different words depending on the language variant (en-US vs. en-GB)

tiff avatar Jul 27 '22 08:07 tiff

@jaumeortola, most of the rules we already have for detecting PT/BR differences should work well for messages and suggestions too, such as:

Captura de tela 2022-07-28 112500 Captura de tela 2022-07-28 112728

But we still don't have a solution for words that include an extra "c" before the "t" in Portugal. This would be hard to do because it's a matter of the "c" being pronounced or not in speech. And that's just random, so we would have to do it manually.

Captura de tela 2022-07-28 112758

susanaboatto avatar Jul 28 '22 09:07 susanaboatto

a similar approach could also be helpful for <suggestion>'s e.g., when we want to suggest different words depending on the language variant (en-US vs. en-GB)

@tiff Usually, we have different rules for each language variety. Do you have any examples of different suggestions that could be useful in English?

jaumeortola avatar Jul 28 '22 13:07 jaumeortola

@jaumeortola

colour and color for example.

marcoagpinto avatar Jul 28 '22 13:07 marcoagpinto

colour and color for example.

I know those cases. But is there any rule in English where we suggest both (or the incorrect one)?

jaumeortola avatar Jul 28 '22 13:07 jaumeortola

@jaumeortola a style rule that suggests replacing "say sorry" with "apologize". I would need to maintain two very similar rules. One that suggests "apologise" and one that suggests "apologize".

tiff avatar Jul 28 '22 13:07 tiff

@susanaboatto For fato/facto, contato/contacto, currently we say nothing in Brazilian Portuguese. All forms are allowed. That means that we should create new rules for those words, shouldn't we? At least, to recommend using fato and contato. Once the rules are created, we could use them in a PT→BR adapter or translator.

In this folder, you have all possible variants I extracted from the tagger dictionary. These lists can be used to create new rules or improve existing rules. But they have to be revised. Some word pairs are not variants, but different words with different uses and meanings.

jaumeortola avatar Jul 28 '22 13:07 jaumeortola