pants Helper to make linters

I was looking at one of my old projects. It was from before I discovered Pants, so it uses Tox. I was reminded that it is incredibly easy to just run a python package in tox:

[testenv:radon]
deps = radon
commands = 
    radon cc -s --total-average --no-assert -nb src/

I thought it would be great if it were that easy to create a linter in Pants. So I hacked together this prototype! now it's as simple as

def rules():
    class VultureTool(MetalintTool):
        options_scope = "vulture"
        name = "Vulture"
        help = """Vulture finds unused code in Python programs"""

        default_version = "vulture==2.5"
        default_main = ConsoleScript("vulture")

        args = ArgsListOption(example="--min-confidence 95")

    def vulture_args(tool: VultureTool, files: Tuple[str, ...]):
        return tool.args + files

    vulture = make_linter(VultureTool, "vulture", vulture_args)

    return [
        *vulture.rules(),
    ]

This MR follows (reasonably closely) the Add a linter common task walkthrough.

Initial discussion questions:

Do we want this type of helper in Pants?
Is this type of helper (a function that generates the rules) the type of helper we want?
What features would we need before helpers like this are mergeable?

Understanding this MR

register.py contains 3 example linters. 2 of them share the same tool, which produces some surprising results but actually works. MetalintTool extends PythonToolBase with the fields that make the expected actions (in particular, export and generate-lockfiles) go. This class is 1 part of the basic interface of Metalint. The other component is make_linter, which creates all the classes and the rules and shoves them all together into the Metalint DTO.

Some of the helpers could also be useful on their own. For example, make_export_rules can dynamically generate the boilerplate for the export task.

What is even happening

MetalintTool adds all the fields to make things go. Metalint is a DTO, convenient for gathering all the rules make_linter is the real function here. It accepts a MetalintTool class and an argv_maker (which enables the caller to format the invocation arguments as required. This allows passing the right flags and formatting all the filenames, if needed). It optionally accepts the FieldSet and LintTargetsRequest, as escape hatches. IDK, until Mypy ruined the fun it was an easy include. make_export_rules and make_lockfile_rules use a cache to ensure that a tool used in multiple linters only has 1 set of rules generated for it. (It looks like Pants deduplicates rules if they are equal by identity.) Otherwise it's fairly standard dynamic class generation. The classes themselves are not equal by identity because they are created as variables inside the function, so each invocation of the function creates a new class. It uses a function instead of a class because the classes have dependencies on each other; but all the statements in a class have to be mostly valid on their own, so they can't depend on each other like this (at least AFAIK).

Known limitations and future work

tests
lockfile generation is a lot of a mess, at least internally. The problem is that the lockfile code assumes that we're shipping a lockfile. Which we aren't, since someone is dynamically generating the idea of this lockfile. It still works, although you have to generate lockfiles first.
does not allow post-processing the result of running the tool. Some tools won't fail your code, but you might want to induce a failure depending on the output. For example, Radon's Maintainability Index: you might want to fail if this is too low. I include this here because inserting a postprocess step might be really easy.
only works with Linters. I think iformatters would be necessary. Ideally we'd also have run targets, although that's maybe less necessary with macros and the experimental_shell_command
only works with python tools. At a minimum it should work with a binary that the user has on their system. Ideally we'd also build helpers to fetch default_known_versions from popular places, like github releases.
typing is a bit of a mess, if anyone knows how to get mypy to let you use an arg of type Type[T] as a valid type annotation LMK.

Aug 04 '22 04:08 lilatomic

Without diving into the details too much, I like the idea of "if your tool fits this description, you can use this turnkey API". Not all tools will fit, but certainly some would (even ones we have internally). Permit me some time to stew on it more, but I think the overall is a :heavy_plus_sign: from me.

Aug 04 '22 13:08 thejcannon

Without digging into the code too much (details are less important to me right now), this is a +1 from me.

I'm a fan of making plugins/APIs easy, even at the expense of technical overhead. Something of this ilk would definitely strip out a lot of code from the codebase, possibly at some extra config expense.

I like the idea that these linters/formatters are more about their subsystems, rather than a decent chunk of copied/pasted rule code. Maybe from a global meta-class, maybe from a per-backend meta-linter. And if the meta-class wouldn't work well - then you fall back to exactly how it is today anyways.

Alternatively, maybe some of this functionality could come in the form of additional composable rules/rule_helpers? Haven't thought this through yet.

As I see it, most of the linters/formatters that I use or have worked on come down to:

Optional Skip
Optionally partition
Get source files (if they didn't come in with the fmtrequest/field_set for some reason)
Optionally get config files
Optionally get other resources/files
Bundle digests
Run (maybe custom) process with custom args
Return Lint/Fmt results

Most of that feels boilerplate-y since the inputs and outputs are pretty well-defined... Being able to specify those in the Fmt/Lint subsystem seems perfectly reasonable.

Some considerations:

Some linters/formatters use a custom process (NpxProcess, TerraformProcess, etc)
Not all subsystem tools are pex files, so there might be a generalized approach to return a hydrated tool (or maybe that's specified in the Subsystem file, as it kinda is today)

In general, I'd love to see and use this in mainline!

@lilatomic In the existing repo, do we have non-anecdotal numbers on how much boilerplate something like this would save? Or some concept of which tools it would work out of the box for, against how many would have to go custom anyways?

Edit: Forgot to mention, there probably even is some world where the rules_integration_test could be made declarative, assuming a common set of pass/fail tests for each new subsystem - where for testing, you provide example pass/fail files. Rules for these are also VERY boilerplate-y.

Aug 15 '22 12:08 sureshjoshi

So, doing a bit of an overview of the applicability for boilerplate reduction: generating lockfile (searched GeneratePythonLockfile):

19 are pure boilerplate (return GeneratePythonLockfile.from_tool(..., use_pex=python_setup.generate_lockfiles_with_pex))
6 are slightly advanced boilerplate (a conditional for using a custom lockfile, which maybe everything should support)
5 are complex (eg mypy)

generating export (searched ExportPythonTool):

6 are pure boilerplate
3 are fetch interpreter constraints. I'm not sure if this is a different kind of boilerplate
3 are more complicated (eg mypy)

assessing the applicability of the main generated lint rule (-> LintResults) is harder, I've found at least 3 which look mostly boilerplate. I haven't looked at which formatters would be suitable, since metalint doesn't do that yet.

Aug 18 '22 01:08 lilatomic

Relating to integration testing, that's really interesting. I wonder if the real boilerplate is in the tests, rather than the code. Obviously, any tool defined through metalint has that boilerplate tested. But if it can't be defined by metalint, we could at least run boilerplate tests. The example that comes to mind is that the "skip" flag is honoured. (Although, perhaps that's an indication that we need more rule helpers or higher-order rules to handle that behaviour).

On the subject of non-python-tools, I think there are basically 4 types of tools that we could easily serve with this:

Python tools: I think this is a good direction to start with because that's Pant's focus and it helps with parity with existing tools like Tox
Tools that Just Exist on the system: the all-important access hatch. Maybe it's installed from a different package manager (like grep from the OS package manager). Or maybe they're porting things over and will upgrade eventually. Or maybe they're downloadable but require some sort of installation.
Tools we can easily download: ExternalTool or TemplatedExternalTool
docker tools? Lots of tools come with a docker container. I don't think Pants really has the facility for running containerised tools, but if it did, there would just be a lot of boilerplate since the tools is completely installed

Aug 18 '22 01:08 lilatomic