lean4
lean4 copied to clipboard
fix: annotate all syntax nodes produced by quotations as synthetic
We recently noticed that it is problematic that quotations such as `($x + $y) may have a non-synthetic range (if x and y are non-synthetic). I can't find the original issue right now... but let's start by testing the overhead of fixing it this way.
!bench
As @gebner noted, the synthetic position range itself may still not be very meaningful if macro arguments are swapped at any point. Ideally the synthetic range would consist of the minimum to the maximum position within the entire subtree. But that is a less important issue than correctly identifying the node as synthetic at all.
Would it be an option to tag all nodes with the info generated from the reference?
If we make sure it is synthetic (edit: ah, we already do that of course), that could also work and should be cheaper. I went with this implementation so that positions would not change compared to before, but as mentioned your solution may produce more meaningful ranges. We should also check where we actually consider synthetic position, I assume it's mostly diagnostics reporting.
Here are the benchmark results for commit 44eccffd12ecfb1da7c384abc27818e25f10adaa. There were significant changes against commit e400b02a0594510a5a12f46f72a0e8cf190b6b9f:
Benchmark Metric Change
========================================
- qsort task-clock 12% (40.0 σ)
- qsort wall-clock 12% (39.3 σ)
Aha, we do have some specific annotations for error positions such as in https://github.com/leanprover/lean4/commit/01f8127c1669cbfe6eb390c2e1464657c38864a5. If we went with defaulting nodes to the ref range, those would get ignored: https://github.com/Kha/lean4/commit/fe4d740841766dc954d6280d170d4b45413c66c8. So, given that there is no significant overhead, we should probably go with the current PR.
Aha, we do have some specific annotations for error positions such as in 01f8127. If we went with defaulting nodes to the ref range, those would get ignored
Indeed, that's a common pattern which should probably be migrated to withRef tk `(...).
I think the migration would be worth it. The current approach is super broken when arguments are swapped:
#check "x" |> Nat.succ
/-
▶ 1:15-1:11: error:
application type mismatch
Nat.succ "x"
argument
"x"
has type
String : Type
but is expected to have type
Nat : Type
-/
(Yeah, a diagnostics range with "negative" length. EDIT: neovim doesn't show any squiggles at all, vscode swaps the start/end of the range and highlights |> including the spaces around it.)
Indeed, that's a common pattern which should probably be migrated to
withRef tk `(...).I think the migration would be worth it.
@leodemoura What do you think, should we use withRef in preference to token antiquotations in macros, and then tag nodes created by quotations with the reference's full range? I think it makes sense.
@Kha
What do you think, should we use
withRefin preference to token antiquotations in macros, and then tag nodes created by quotations with the reference's full range? I think it makes sense.
Note that token antiquotations are useful when the goal is specifically to preserve the original source info. For instance, I use them in Alloy here specifically to copy the spacing around the start and end brackets (otherwise the synthetic brackets produce wonky spacing).
However, I should note that this is actually just part of a work-around for the fact that the following errors out with an expected '{' on body:
LEAN_EXPORT $ty:cTypeSpec $[$ptr?:pointer]? $id:ident($params:params) $body:compStmt
It seems that Lean requires me to break up the body into its component parts for some reason.
Note that token antiquotations are useful when the goal is specifically to preserve the original source info.
I wasn't planning on removing them.
It seems that Lean requires me to break up the body into its component parts for some reason. Merge state
Ah, I don't think we intended syntax (name := compStmt) to be used in this way.
syntax compStmt := "{" declaration* cStmt* "}"
syntax compStmt : cStmt
is not exactly longer, but we could make it work with name as well.
As for the actual PR, we decided to go with @gebner's suggestion. I'll keep this PR open as a reminder.
@Kha
syntax compStmt := "{" declaration* cStmt* "}" syntax compStmt : cStmt
Oh. I had assumed that syntax (name := foo) "foo" : cat was translated to essentially @[catParser] def foo := leading_parser symbol "foo", but apparently that isn't quite the case.
is not exactly longer, but we could make it work with
nameas well.
I would very much appreciate this!
I would very much appreciate this!
Would be quite an intrusive change unfortunately as ParserDescr.nodeWithAntiquot does not currently take a precedence.