vscode-language-nextflow icon indicating copy to clipboard operation
vscode-language-nextflow copied to clipboard

Steps towards an ANTLR4 based LSP for vscode-nextflow

Open uwwint opened this issue 3 years ago • 11 comments

Warning: DON'T MERGE INTO MASTER - this needs to go into a feature branch ;)

This PR is going back to a chat I had with ~~Tomaso~~ Paolo (😄). It is purely to provide visibility of my current development effort towards an ANTLR4 based language server protocol implementation of nextflow.

The goal is to have state of the art code completion, variable referencing, type checking, linking to definition statements etc. which is still a bit to go.

Implemented:

  • [x] support to autocomplete most configuration scope variables in configuration files

To be implemented:

  • [ ] refactor parseTree traversal helper functions
  • [ ] support rudimentary type inference for assign statements in config file
  • [ ] support process scope
  • [ ] support workflow scope
  • [ ] support import statements
  • [ ] support documentation
  • [ ] support channels
  • [ ] support variables
  • [ ] support params ...

@jemunro in case you are looking into something like this as well, lets bundle forces

uwwint avatar Nov 23 '22 09:11 uwwint

Hi @uwwint , before you go too much further, I have written an ANTLR4 grammar for Nextflow script and config files. It looks like you already found the Groovy grammar. Let's find a way to add my Nextflow grammars so you don't end up re-inventing the wheel.

bentsherman avatar Nov 23 '22 14:11 bentsherman

Hi @bentsherman, Perfect! Could you point me to the grammar? It's straight forward to integrate. I'll put some documentation here later today.

uwwint avatar Nov 23 '22 19:11 uwwint

It's currently sitting in a private repo. I will see about making it public when I get back from holiday next week.

bentsherman avatar Nov 24 '22 00:11 bentsherman

Awesome @bentsherman. Cannot wait

uwwint avatar Nov 25 '22 00:11 uwwint

@uwwint I added my grammar specs for nextflow config files and scripts. I was working with a "tiny groovy" spec to keep things simple, but I think we can move forward with the full groovy spec for now. I haven't set up the Groovy-optimized fork of antlr4 so I can't test these grammars yet, so feel free to run with them.

bentsherman avatar Nov 28 '22 21:11 bentsherman

Hi @bentsherman,

Thanks for the contribution, I have successfully rebased your grammar on the official groovy, split them out, and generated all the compiled code. I will proceed with modifying grammarAdapter to utilise them.

uwwint avatar Nov 29 '22 06:11 uwwint

@uwwint I was able to build the grammar and run the grammar test. I decide to use the config files from nf-core/rnaseq because they should provide good coverage of the config language. I'm still getting a few weird errors, seemingly around newlines:

server/configs/rnaseq_base.config

server/configs/rnaseq_modules.config
line 843:91 token recognition error at: ',\n'
-1: message

server/configs/rnaseq_nextflow.config
line 265:114 token recognition error at: '\n'
line 266:0 token recognition error at: '            return obj\n'
line 267:0 token recognition error at: '        }\n'
line 268:0 token recognition error at: '    } else if (type == 'time') {\n'
line 269:0 token recognition error at: '        try {\n'
line 270:0 token recognition error at: '            if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)\n'
line 271:0 token recognition error at: '                return params.max_time as nextflow.util.Duration\n'
line 272:0 token recognition error at: '            else\n'
line 273:0 token recognition error at: '                return obj\n'
line 274:0 token recognition error at: '        } catch (all) {\n'
-1: message

server/configs/rnaseq_igenomes.config

server/configs/rnaseq_test.config

server/configs/rnaseq_test_full.config

The problem lies with the Groovy grammar, because you still get these errors if you just use the Groovy parser. I'm wondering if the code your ported to typescript in the grammar files need to be updated since I reverted the grammar from Groovy 4 to Groovy 3?

bentsherman avatar Dec 03 '22 00:12 bentsherman

@uwwint / @bentsherman - any updates on this one? (pleasepleasepleaseplease)

ewels avatar Aug 18 '23 15:08 ewels

I have concluded that a custom ANTLR grammar isn't the right approach. Even if we did get it to work, it would only produce a "parse" tree, which is just an intermediate step to an AST. I think we would have to rewrite much of the Groovy parser to get to a custom Nextflow AST. Uwe has been radio silent since December, so I'm not expecting him to come save the day anytime soon.

Instead, I think we will implement the LSP in Java/Groovy, so that we can reuse the Groovy parser and Nextflow AST transformations. It will have the same weird errors as Nextflow, but we've made a lot of improvements to error messages recently, so I feel more confident now that we can still provide a good user experience, even if we can't always identify the error down to the exact token.

bentsherman avatar Aug 18 '23 16:08 bentsherman

The thing that clicked for me is the Nextflow DSL is all valid Groovy syntax, so we can just use the Groovy AST, perhaps with a few syntax transformations, to do everything we want to do with the LSP.

bentsherman avatar Aug 18 '23 16:08 bentsherman

One more thing... Jordi started on a Java-based Nextflow LSP here: https://github.com/jordeu/nextflow-language-server

So we're already on to a good start!

bentsherman avatar Aug 18 '23 16:08 bentsherman

Close in favor of #30

bentsherman avatar May 28 '24 23:05 bentsherman