mathlib4 feat: linter and script for `theorem` vs `lemma`

This PR contains a linter for flagging doc-string-less theorems, as well as a bash script that, once the linter ran on Mathlib, automatically replaces every linter-offending theorem by lemma.

May 13 '24 14:05 adomani

The logic of the linter is as follows.

First, a linter is a command that takes the syntax of each command and does something with it. (Really, a linter can have access to more than just the syntax, since it can also read information contained in infotrees, but this linter does not need that.)

thmNoDoc inspects the syntax of each command. The match does the following:

if the syntax is a declaration,
- if among its modifiers, the one for the docstring is empty,
- if among declarations is a theorem (as opposed to instance, def, ...),

then it extracts the syntax node for theorem (that will be underlined by the warning) and the syntax node for the name of the theorem (to report that name in the warning).

With this information, the linter logs its warning!

If it helps, this is a "pictorial" representation of the syntax tree of a simple theorem:

inspect: '
@[simp]
nonrec theorem easy : True :=
  .intro'

node Lean.Parser.Command.declaration, none
|-node Lean.Parser.Command.declModifiers, none
|   |-node null, none
|   |-node null, none
|   |   |-node Lean.Parser.Term.attributes, none
|   |   |   |-atom original: ⟨⟩⟨⟩-- '@['
|   |   |   |-node null, none
|   |   |   |   |-node Lean.Parser.Term.attrInstance, none
|   |   |   |   |   |-node Lean.Parser.Term.attrKind, none
|   |   |   |   |   |   |-node null, none
|   |   |   |   |   |-node Lean.Parser.Attr.simp, none
|   |   |   |   |   |   |-atom original: ⟨⟩⟨⟩-- 'simp'
|   |   |   |   |   |   |-node null, none
|   |   |   |   |   |   |-node null, none
|   |   |   |-atom original: ⟨⟩⟨\n⟩-- ']'
|   |-node null, none
|   |-node null, none
|   |-node null, none
|   |-node null, none
|   |   |-node Lean.Parser.Command.nonrec, none
|   |   |   |-atom original: ⟨⟩⟨\n⟩-- 'nonrec'
|-node Lean.Parser.Command.theorem, none
|   |-atom original: ⟨⟩⟨ ⟩-- 'theorem'
|   |-node Lean.Parser.Command.declId, none
|   |   |-ident original: ⟨⟩⟨ ⟩-- (easy,easy)
|   |   |-node null, none
|   |-node Lean.Parser.Command.declSig, none
|   |   |-node null, none
|   |   |-node Lean.Parser.Term.typeSpec, none
|   |   |   |-atom original: ⟨⟩⟨ ⟩-- ':'
|   |   |   |-ident original: ⟨⟩⟨ ⟩-- (True,True)
|   |-node Lean.Parser.Command.declValSimple, none
|   |   |-atom original: ⟨⟩⟨ ⟩-- ':='
|   |   |-node Lean.Parser.Term.dotIdent, none
|   |   |   |-atom original: ⟨⟩⟨⟩-- '.'
|   |   |   |-ident original: ⟨⟩⟨\n\n⟩-- (intro,intro)
|   |   |-node Lean.Parser.Termination.suffix, none
|   |   |   |-node null, none
|   |   |   |-node null, none
|   |   |-node null, none

and of a similar definition:

inspect: '
/-- docs -/
def easy : True :=
  .intro'

node Lean.Parser.Command.declaration, none
|-node Lean.Parser.Command.declModifiers, none
|   |-node null, none
|   |   |-node Lean.Parser.Command.docComment, none
|   |   |   |-atom original: ⟨⟩⟨ ⟩-- '/--'
|   |   |   |-atom original: ⟨⟩⟨\n⟩-- 'docs -/'
|   |-node null, none
|   |-node null, none
|   |-node null, none
|   |-node null, none
|   |-node null, none
|-node Lean.Parser.Command.definition, none
|   |-atom original: ⟨⟩⟨ ⟩-- 'def'
|   |-node Lean.Parser.Command.declId, none
|   |   |-ident original: ⟨⟩⟨ ⟩-- (easy,easy)
|   |   |-node null, none
|   |-node Lean.Parser.Command.optDeclSig, none
|   |   |-node null, none
|   |   |-node null, none
|   |   |   |-node Lean.Parser.Term.typeSpec, none
|   |   |   |   |-atom original: ⟨⟩⟨ ⟩-- ':'
|   |   |   |   |-ident original: ⟨⟩⟨ ⟩-- (True,True)
|   |-node Lean.Parser.Command.declValSimple, none
|   |   |-atom original: ⟨⟩⟨ ⟩-- ':='
|   |   |-node Lean.Parser.Term.dotIdent, none
|   |   |   |-atom original: ⟨⟩⟨⟩-- '.'
|   |   |   |-ident original: ⟨⟩⟨\n\n⟩-- (intro,intro)
|   |   |-node Lean.Parser.Termination.suffix, none
|   |   |   |-node null, none
|   |   |   |-node null, none
|   |   |-node null, none
|   |-node null, none

The declModifiers can be doc-strings, attributes, visibility stuff, noncomputable, unsafe, partial,... The important part is that they must appear in a specific order, the first one being a doc-string.

May 13 '24 23:05 adomani

Also: should the linter move to Mathlib/Tactic/Linter? (Did the zulip discussion come to a conclusion?)

That discussion did not come to a conclusion...

May 13 '24 23:05 adomani

Also: should the linter move to Mathlib/Tactic/Linter? (Did the zulip discussion come to a conclusion?)

That discussion did not come to a conclusion...

Given that some consensus was reached now: please move this linter to Tactic/Linter --- but this can happen after the mass replacements.

May 25 '24 00:05 grunweg

PR summary d88c5cb5c5

Import changes

No significant changes to the import graph

Declarations diff

+ getLinterHash + thmLemmaLinter + thmNoDoc

You can run this locally as follows

## summary with just the declaration names:
./scripts/no_lost_declarations.sh short <optional_commit>

## more verbose report:
./scripts/no_lost_declarations.sh <optional_commit>

Jun 19 '24 23:06 github-actions[bot]

I just realised: this linter already exists in batteries; it is the docBlameThm linter, defined here in batteries. (That linter is not autofixable by itself, so I see no pain in this linter being written. But I would rather run that linter in mathlib/find a way to run it from mathlib CI, than land a new linter re-invent the wheel.) Let me ask on zulip about configuring runLinter to enable certain lints explicitly.

Jul 11 '24 16:07 grunweg

Do I understand this correctly that this PR should be adapted to use the docBlameThm linter instead, but the shell script might still be useful?

Aug 14 '24 14:08 joneugster