lean4 fix: annotate all syntax nodes produced by quotations as synthetic

We recently noticed that it is problematic that quotations such as `($x + $y) may have a non-synthetic range (if x and y are non-synthetic). I can't find the original issue right now... but let's start by testing the overhead of fixing it this way.

Feb 02 '22 14:02 Kha

!bench

Feb 02 '22 14:02 Kha

As @gebner noted, the synthetic position range itself may still not be very meaningful if macro arguments are swapped at any point. Ideally the synthetic range would consist of the minimum to the maximum position within the entire subtree. But that is a less important issue than correctly identifying the node as synthetic at all.

Feb 02 '22 14:02 Kha

Would it be an option to tag all nodes with the info generated from the reference?

Feb 02 '22 14:02 gebner

If we make sure it is synthetic (edit: ah, we already do that of course), that could also work and should be cheaper. I went with this implementation so that positions would not change compared to before, but as mentioned your solution may produce more meaningful ranges. We should also check where we actually consider synthetic position, I assume it's mostly diagnostics reporting.

Feb 02 '22 14:02 Kha

Here are the benchmark results for commit 44eccffd12ecfb1da7c384abc27818e25f10adaa. There were significant changes against commit e400b02a0594510a5a12f46f72a0e8cf190b6b9f:

  Benchmark   Metric       Change
  ========================================
- qsort       task-clock      12% (40.0 σ)
- qsort       wall-clock      12% (39.3 σ)

Feb 02 '22 14:02 leanprover-bot

Aha, we do have some specific annotations for error positions such as in https://github.com/leanprover/lean4/commit/01f8127c1669cbfe6eb390c2e1464657c38864a5. If we went with defaulting nodes to the ref range, those would get ignored: https://github.com/Kha/lean4/commit/fe4d740841766dc954d6280d170d4b45413c66c8. So, given that there is no significant overhead, we should probably go with the current PR.

Feb 02 '22 16:02 Kha

Aha, we do have some specific annotations for error positions such as in 01f8127. If we went with defaulting nodes to the ref range, those would get ignored

Indeed, that's a common pattern which should probably be migrated to withRef tk `(...).

I think the migration would be worth it. The current approach is super broken when arguments are swapped:

#check "x" |> Nat.succ
/-
▶ 1:15-1:11: error:
application type mismatch
  Nat.succ "x"
argument
  "x"
has type
  String : Type
but is expected to have type
  Nat : Type
-/

(Yeah, a diagnostics range with "negative" length. EDIT: neovim doesn't show any squiggles at all, vscode swaps the start/end of the range and highlights |> including the spaces around it.)

Feb 02 '22 16:02 gebner

Indeed, that's a common pattern which should probably be migrated to withRef tk `(...).

I think the migration would be worth it.

@leodemoura What do you think, should we use withRef in preference to token antiquotations in macros, and then tag nodes created by quotations with the reference's full range? I think it makes sense.

Mar 21 '22 11:03 Kha

@Kha

What do you think, should we use withRef in preference to token antiquotations in macros, and then tag nodes created by quotations with the reference's full range? I think it makes sense.

Note that token antiquotations are useful when the goal is specifically to preserve the original source info. For instance, I use them in Alloy here specifically to copy the spacing around the start and end brackets (otherwise the synthetic brackets produce wonky spacing).

However, I should note that this is actually just part of a work-around for the fact that the following errors out with an expected '{' on body:

LEAN_EXPORT $ty:cTypeSpec $[$ptr?:pointer]? $id:ident($params:params) $body:compStmt

It seems that Lean requires me to break up the body into its component parts for some reason.

Jul 10 '22 23:07 tydeu

Note that token antiquotations are useful when the goal is specifically to preserve the original source info.

I wasn't planning on removing them.

It seems that Lean requires me to break up the body into its component parts for some reason. Merge state

Ah, I don't think we intended syntax (name := compStmt) to be used in this way.

syntax compStmt := "{" declaration* cStmt* "}"
syntax compStmt : cStmt

is not exactly longer, but we could make it work with name as well.

Jul 11 '22 07:07 Kha

As for the actual PR, we decided to go with @gebner's suggestion. I'll keep this PR open as a reminder.

Jul 11 '22 07:07 Kha

@Kha

syntax compStmt := "{" declaration* cStmt* "}"
syntax compStmt : cStmt

Oh. I had assumed that syntax (name := foo) "foo" : cat was translated to essentially @[catParser] def foo := leading_parser symbol "foo", but apparently that isn't quite the case.

is not exactly longer, but we could make it work with name as well.

I would very much appreciate this!

Jul 11 '22 08:07 tydeu

I would very much appreciate this!

Would be quite an intrusive change unfortunately as ParserDescr.nodeWithAntiquot does not currently take a precedence.

Jul 11 '22 09:07 Kha

lean4 lean4 copied to clipboard

fix: annotate all syntax nodes produced by quotations as synthetic

lean4
lean4 copied to clipboard