languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

[fr] false alarm with domain in CONF_N_V

Open tiff opened this issue 2 years ago • 3 comments

J'utilise google.com

Bildschirmfoto 2022-06-24 um 16 18 03

@jaumeortola any idea how we can immunize or better tokenize domains?

tiff avatar Jun 24 '22 14:06 tiff

There is a global disambiguation rule that should add the POS tags _IS_URL. But there is a syntax error in the rule, and the tags are not added. Possible solutions:

  1. Add the tags in global disambiguation, and add exceptions in rules that need it (<exception postag="_IS_URL"/>).
  2. Fully immunize the domain name. Now it is only "ignore_spelling".
  3. Replace all the existing tags with _IS_URL.

For now, I implement (1) for this French sentence. (2) or (3) are more general solutions (but there could be undesired effects).

jaumeortola avatar Jun 24 '22 20:06 jaumeortola

https://github.com/languagetool-org/languagetool/commit/94b3f3f5d7dfc20db5be2524327cc2503999faba

jaumeortola avatar Jun 24 '22 20:06 jaumeortola

Thanks so much!

tiff avatar Jun 25 '22 00:06 tiff