linter-languagetool icon indicating copy to clipboard operation
linter-languagetool copied to clipboard

Improve Markup Language Support

Open hesstobi opened this issue 8 years ago • 17 comments

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

hesstobi avatar Apr 04 '17 14:04 hesstobi

We should go for a proper solution following the linter-spell one. Will look at it in the coming days.

wysiib avatar Apr 05 '17 09:04 wysiib

I am unsure whether the core plugin should include language-specific features. The same goes for issue #7. However, I am not sure about an API for connecting language-definitions as separate packages either. Any suggestions?

wysiib avatar Jun 20 '17 06:06 wysiib

linter-spell-latex actually compiled excluded scopes for latex. Might be helpful.

zoenglinghou avatar Feb 27 '18 19:02 zoenglinghou

We thought about porting the solution done by the linter-spell package for quite some time. Currently, I am switching jobs and thus I do not have the time to implement things myself. But I will look into it, properly around end of Mai.

wysiib avatar Mar 05 '18 09:03 wysiib

This linter is doing a great job. In case of writing a document with markup language like LaTeX it could be improved, because is shows errors on every latex command. As a quick and dirty solution a add following lines:

  editorContent = editorContent.replace /(\\\w+)((?:\{[^\}]*\})*)((?:\[[^\]]*\])*)((?:\{[^\}]*\})*)/g , (match, name, group1, group2, group3, index, input) ->
    if /\\(\w*section|\w*caption|text\w*|mbox)/.test(name)
      output = Array(name.length+1).join(" ") +
        group1.replace(/[\{\}]/g, " ") +
        Array(group2.length+1).join(" ") +
        group3.replace(/[\{\}]/g, " ")
    else
      output = Array(match.length+1).join " "
    return output

Which replacing the large part of the LaTeX markup with spaces. I than disabled the WHITESPACE_RULE A more general approach would be to ignore grammar scopes and pattern with an API like linter-spell is providing.

Hi! How could I use this workaround until a final solution is found?

Thanks!

29antonioac avatar Nov 03 '18 18:11 29antonioac

You can use my branch, which add the basic support for markup languages using the linter-spell-api. I use this a lot for latex. There are still a lot of things missing.... https://github.com/hesstobi/linter-languagetool/tree/linter-spell-api

hesstobi avatar Nov 03 '18 19:11 hesstobi

Thanks for your work! It works pretty well :).

Only one question: in my documents the command \gls{} for handling acronyms are not correctly filtered. Is this a problem related to your plugin or related to linter-spell?

Thanks for all!

29antonioac avatar Nov 05 '18 13:11 29antonioac

I would like to give this thumbs up. Support for LaTeX would be so awesome!

73 avatar Nov 20 '18 14:11 73

I don't know if this helps or not, but the LanguageTool Server now has support for processing annotated text. Not sure when exactly they implemented it. You can see the data parameter of the API at SwaggerHub for an example. It takes a value like:

{"annotation":[
 {"text": "A "},
 {"markup": "<b>"},
 {"text": "test"},
 {"markup": "</b>"}
]}

Using the linter-spell approach, perhaps the different formats could be mapped to this annotated format? This would preserve offsets, I believe, and potentially be easier than trying to reduce to pure text.

davidlday avatar Dec 18 '18 20:12 davidlday

That sounds like another nice way to proceed. I agree, reducing to pure text and keeping offsets intact might be quite a hassle. However, I haven't found a list of "all" the annotations in say Latex. Could this be derived from the language tokens Atom creates anyway? @hesstobi since this is somewhat related to what you are doing: any input?

wysiib avatar Dec 23 '18 14:12 wysiib

Yes I think this is a good way to go. But I currently do not have any time to work on that.

hesstobi avatar Dec 29 '18 11:12 hesstobi

I created a few stand-alone packages that convert markup into LanguageTool's annotated text that might help:

My quick search for a LaTeX parser turned up a couple of packages, but also several SO posts on how challenging it is to create a parser. If you all know of a good parser, I can see about creating another package to handle it. Or you're free to leverage the above to create one as well. :)

davidlday avatar Dec 30 '18 18:12 davidlday

Nice work. But I think this is more useful outside of Atom. Because you will need a parser for every grammar. Atom includes the parsing of all major grammars. With the linter-spell-api it is possibility to choose which scopes should be checked by LanguageTool. This will enable LanguageTool to check comments in programming languages and so on.

hesstobi avatar Dec 30 '18 19:12 hesstobi

Thank you. I see where I misunderstood the parsing in Atom. Should have looked a little closer. :( Anyhow, I'll dig in a little deeper on the grammars & linter-spell as I have time and see if I can help out.

davidlday avatar Dec 30 '18 19:12 davidlday

I've been watching/commenting on an issue on atom-wordcount that feels like a similar problem. Basically trying to eliminate all non-natural language text from a document's word count. Getting tokenized lines seems to be possible using Atom's public API by:

editorGrammar = editor.getGrammar()
editorGrammar.tokenizeLines(editor.getText())

See the early snippet in the issue for an example of filtering out scopes using first-mate. This doesn't work for tree-sitter grammars but a similar approach should be possible

davidlday avatar Jan 16 '19 12:01 davidlday

This is the API we need! I added this to #23. But we should also find a way for tree-sitter.

hesstobi avatar Jan 16 '19 20:01 hesstobi

@hesstobi Have a look at this commit where I try to use the editor.tokensForScreenRow() API. Note that this API is undocumented and thus subject to change! (See also the discussion in atom-wordcount again.)

mbroedl avatar Jan 17 '19 12:01 mbroedl