linter-languagetool Improve Markup Language Support

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

Apr 04 '17 14:04 hesstobi

We should go for a proper solution following the linter-spell one. Will look at it in the coming days.

Apr 05 '17 09:04 wysiib

I am unsure whether the core plugin should include language-specific features. The same goes for issue #7. However, I am not sure about an API for connecting language-definitions as separate packages either. Any suggestions?

Jun 20 '17 06:06 wysiib

linter-spell-latex actually compiled excluded scopes for latex. Might be helpful.

Feb 27 '18 19:02 zoenglinghou

We thought about porting the solution done by the linter-spell package for quite some time. Currently, I am switching jobs and thus I do not have the time to implement things myself. But I will look into it, properly around end of Mai.

Mar 05 '18 09:03 wysiib

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:
  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output
Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

Hi! How could I use this workaround until a final solution is found?

Thanks!

Nov 03 '18 18:11 29antonioac

You can use my branch, which add the basic support for markup languages using the linter-spell-api. I use this a lot for latex. There are still a lot of things missing.... https://github.com/hesstobi/linter-languagetool/tree/linter-spell-api

Nov 03 '18 19:11 hesstobi

Thanks for your work! It works pretty well :).

Only one question: in my documents the command \gls{} for handling acronyms are not correctly filtered. Is this a problem related to your plugin or related to linter-spell?

Thanks for all!

Nov 05 '18 13:11 29antonioac

I would like to give this thumbs up. Support for LaTeX would be so awesome!

Nov 20 '18 14:11 73

I don't know if this helps or not, but the LanguageTool Server now has support for processing annotated text. Not sure when exactly they implemented it. You can see the data parameter of the API at SwaggerHub for an example. It takes a value like:

{"annotation":[
 {"text": "A "},
 {"markup": "<b>"},
 {"text": "test"},
 {"markup": "</b>"}
]}

Using the linter-spell approach, perhaps the different formats could be mapped to this annotated format? This would preserve offsets, I believe, and potentially be easier than trying to reduce to pure text.

Dec 18 '18 20:12 davidlday

That sounds like another nice way to proceed. I agree, reducing to pure text and keeping offsets intact might be quite a hassle. However, I haven't found a list of "all" the annotations in say Latex. Could this be derived from the language tokens Atom creates anyway? @hesstobi since this is somewhat related to what you are doing: any input?

Dec 23 '18 14:12 wysiib

Yes I think this is a good way to go. But I currently do not have any time to work on that.

Dec 29 '18 11:12 hesstobi

I created a few stand-alone packages that convert markup into LanguageTool's annotated text that might help:

annotatedtext - base package
annotatedtext-remark - converts markdown to annotated text using remark-parse
annotatedtext-rehype - converts html to annotated text using rehype-parse

My quick search for a LaTeX parser turned up a couple of packages, but also several SO posts on how challenging it is to create a parser. If you all know of a good parser, I can see about creating another package to handle it. Or you're free to leverage the above to create one as well. :)

Dec 30 '18 18:12 davidlday

Nice work. But I think this is more useful outside of Atom. Because you will need a parser for every grammar. Atom includes the parsing of all major grammars. With the linter-spell-api it is possibility to choose which scopes should be checked by LanguageTool. This will enable LanguageTool to check comments in programming languages and so on.

Dec 30 '18 19:12 hesstobi

Thank you. I see where I misunderstood the parsing in Atom. Should have looked a little closer. :( Anyhow, I'll dig in a little deeper on the grammars & linter-spell as I have time and see if I can help out.

Dec 30 '18 19:12 davidlday

I've been watching/commenting on an issue on atom-wordcount that feels like a similar problem. Basically trying to eliminate all non-natural language text from a document's word count. Getting tokenized lines seems to be possible using Atom's public API by:

editorGrammar = editor.getGrammar()
editorGrammar.tokenizeLines(editor.getText())

See the early snippet in the issue for an example of filtering out scopes using first-mate. This doesn't work for tree-sitter grammars but a similar approach should be possible

Jan 16 '19 12:01 davidlday

This is the API we need! I added this to #23. But we should also find a way for tree-sitter.

Jan 16 '19 20:01 hesstobi

@hesstobi Have a look at this commit where I try to use the editor.tokensForScreenRow() API. Note that this API is undocumented and thus subject to change! (See also the discussion in atom-wordcount again.)

Jan 17 '19 12:01 mbroedl