languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

Repetition settings: distance_tokens

Open jaumeortola opened this issue 2 years ago • 2 comments

(From the forum: https://forum.languagetool.org/t/en-many-nagging-error-from-rule-rep-passive-voice/8278/4)

REP_PASSIVE_VOICE is triggered with these repetition settings: min_prev_matches="4" distance_tokens="20": when there are 4 previous matches of the same rule, and they are close enough (less than 20 tokens). The problem (or bug) is that the condition of <20 tokens is required only from the penultimate to the last repetition. (Most repetition rules use min_prev_matches="1")

Should we apply the condition "<20 tokens" to all repetitions? Anyway, some maximum length should be required.
What would be the best solution for this rule? @AzadehSafakish

jaumeortola avatar Sep 20 '22 10:09 jaumeortola

Should we apply the condition "<20 tokens" to all repetitions?

That would decrease the number of opens, sure. I believe that is what I intended when I set the tag, to just make the rule as restrictive as possible.

But I think logically it would make more sense to implement a feature where any rule with min_prev_matches > 1 can have a set text length (rule is only triggered if pattern is matched X times within X tokens).

AzadehSafakish avatar Sep 21 '22 06:09 AzadehSafakish

rule is only triggered if pattern is matched X times within X tokens

That is, distance_tokens would refer to the distance between the first and the last repetition. Is that what you mean? I think it makes sense, and it should be easy to implement.

jaumeortola avatar Sep 21 '22 07:09 jaumeortola

Yes, that would be ideal.

AzadehSafakish avatar Sep 22 '22 06:09 AzadehSafakish

Suggested fix: https://github.com/languagetool-org/languagetool/pull/7120

distance_tokens will refer to total distance between the first and the last repetition.

The results will be slightly different. Each language developer can adjust his/her rules.

jaumeortola avatar Sep 23 '22 11:09 jaumeortola

The documentation for the repetition rules: https://github.com/languagetool-org/languagetool-org.github.io/pull/9/files

jaumeortola avatar Sep 28 '22 11:09 jaumeortola

@AzadehSafakish I merged the fix.

Current settings for REP_PASSIVE_VOICE: <rulegroup id="REP_PASSIVE_VOICE" name="Passive voice (repetition experiment)" min_prev_matches="4" distance_tokens="80" tags="picky">

Watch for changes in open/disable events, and adjust the settings if necessary.

jaumeortola avatar Sep 30 '22 15:09 jaumeortola

Thank you so much, @jaumeortola

AzadehSafakish avatar Oct 04 '22 04:10 AzadehSafakish