cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

lint: handle multiline strings & jinja2 code

Open wxtim opened this issue 10 months ago • 5 comments

Closes #6348 Closes #6507 Closes #6685

Check List

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • [x] Tests are included (or explain why tests are not needed).
  • [x] Changelog entry included if this is a change that can affect users
  • [x] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [x] If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

wxtim avatar Jan 11 '25 11:01 wxtim

@oliver-sanders Please could you have a look at this proof of concept before I bother testing it?

wxtim avatar Jan 11 '25 11:01 wxtim

I think this is going to be the quickest way to get over the state barrier, but going statefull is a good opportunity to review our approach here before we lay down a pattern which ends up biting us later.

I haven't really given much thought to this so far, I don't have working knowledge of modern linters, a bit of reading would be a good idea...

Mid to long term, it depends how far we want to go with the linter. The initial idea was a quick and dirty line-by-line regex tool, but we thought we would leave it there (we decided it would not be statefull). Now we are clearly at the point where we want to develop this into something more serious.

I think the optimal solution for this sort of problem is something more tokeniser'ey. I.e, a program that reads the file into labeled bits rather than lines, allowing rules to be run on each "bit" in a more structured way than regex would allow by itself.

E.g, here's an example of how a tokeniser'ey type thing might see the file:

>>> for type, content, state in tokeniser(file):
...    print((type, content))
('shebang', '#!Jinja2', <file:flow.cylc, lines:[0], preproc:jinja2, path:>)
('blankline', '', <file:flow.cylc, lines:[1], preproc:jinja2, path:>)
('section', '[scheduling]', <file:flow.cylc, lines:[2], preproc:jinja2, path:[scheduling]>)
('config', '    initial cycle point = 2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('cycle point', '2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('section', '    [[graph]]', <file:flow.cylc, lines:[4], preproc:jinja2, path:[scheduling][graph]>)
('config', '        R1 = """\nfoo => bar\nbar => baz\n    """', <file:flow.cylc, lines:[5,6,7,8], path:[scheduling][graph]>)
('graph', 'foo => bar\nbar => baz', <file:flow.cylc, lines:[6,7,8], path:[scheduling][graph]R1>)
('jinja2', '{% if ... \n ... %}', <file:flow.cylc, lines:[9,10], path:[scheduling][graph]>)
('jinja2', '{% include ... %}', <file:flow.cylc, lines:[11], path:?>)

It's similar (but less detailed & more annotated) to what our syntax highlighters see. Note, we are carrying the Cylc config "path" where the information is available allowing us to apply specific rules to particular sections or configs and each entry could represent multiple lines. This would allow us, for example, to apply rules, only to graph sections, and to do clever things, like read a few lines ahead to check if different task configs are compatible.

It might be relatively simple to get a paired-down version of this working, e.g something that has only awareness of "plain" Cylc lines and Jinja2 expressions.

There might also be the option of building a linter into one of the Cylc grammars we have already developed? I wouldn't be surprised if this is becoming common in the LSP era.

I mention these things largely to curtail the scope of what we do now. It might be that the Cylc linter never gets much further than it is now, in which case this is a pretty decent pattern. But it is also possible that we end up developing this much further and the community starts contributing rulesets. So we should be cautious of how far we push this pattern, if it's starting to look like this utill might balloon, we should be ready to change tack rather than bang our heads against the pattern we've got. If there's a pattern out there we could easily adopt, this might be a good opportunity to do so before we start trying to get clever with our line reader. And if we do go ahead with this (and I don't think that would be a bad option), we should be cautious over how far we push it.

oliver-sanders avatar Jan 13 '25 16:01 oliver-sanders

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

hjoliver avatar Jan 15 '25 01:01 hjoliver

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

We already have validate, and I'm against trying too hard with lint. If that means that eventually there are things we can't deal with, so be it. It's already a goodish way from "a sed script to identify 7 to 8 changes".

wxtim avatar Jan 15 '25 07:01 wxtim

We've had another multiline lint issue reported https://github.com/cylc/cylc-flow/issues/6685 (nothing to do with Jinja2 though this time).

oliver-sanders avatar Mar 26 '25 13:03 oliver-sanders

Low priority - Branch & PR still available, but not currently worked on

wxtim avatar Aug 14 '25 13:08 wxtim