Make the LSP independent of level assignment.
The fingerprint calculation reverted to is_const (as it is now done in the main branch) instead of get_level to make the fingerprint calculation independent of the level assignment. This is done to avoid having level assignments on the critical path in the LSP, as it is too slow. In my case, it took 30 seconds to calculate the levels, which blocks subsequent LSP commands.
The TLAPM parser has a level-checker?
It has, but I'm unsure how complete it is. See e.g. https://github.com/tlaplus/tlapm/blob/d8e2a2135063fbb0bd6a4fb3af7bba46951059c8/src/expr/e_levels.ml
Would you leave your thoughts on https://github.com/tlaplus/rfcs/issues/16 since you know much more about the TLAPM parser than anyone else?
Thanks for starting that discussion. I will try to join it, but my knowledge of that parser is limited.
Thanks @damiendoligez. I will spend another few evenings on the level checking performance, and if nothing good happens, I will merge this PR.
@kape1395 worth noting the TLA+ Foundation is funding me to switch TLAPM over to the Java parser (SANY) in the near future, which will take care of these level-checking performance issues from the perspective of TLAPM. However, it makes sense to spin out the TLAPM parser into its own repo (it's still quite a good parser and worth keeping for its own sake!) so you can have the choice of continuing to use the TLAPM parser or using SANY. The work will involve translating SANY's parse tree into the parse tree format used by TLAPM so these changes should be transparent to you.
However, it makes sense to spin out the TLAPM parser into its own repo (it's still quite a good parser and worth keeping for its own sake!) so you can have the choice of continuing to use the TLAPM parser or using SANY. The work will involve translating SANY's parse tree into the parse tree format used by TLAPM so these changes should be transparent to you.
Why is it important to preserve the interface between TLAPM and its legacy parser? What is it buying us?
All the transformations to the backend provers are defined over the existing parse tree format, is my current understanding.
Understood - there’s no issue with refactoring the parse tree format, but it will require refactoring some of the transformations. To minimize the risk of regressions, such refactoring should be avoided unless justified.
However, it makes sense to spin out the TLAPM parser into its own repo.
I don't think we need to keep the existing tlapm parser if the new one is integrated and works well.
Where can I put my wishes (would become requirements, if agreed) for the parser integration? Preliminary:
- Should we add protobuf and/or grpc interface for the parser instead of parsing the XML? For performance reasons. LSP uses JSON-RPC, maybe we should follow that standard?
- I would like to pass some checksums/modification times for the modules already parsed by the tlapm. This way, it could avoid repeatedly parsing them (the dependencies of a module currently edited). A stateful interaction is probably needed to handle that properly (that would imply grpc or similar, I guess).
- I would like to have a way to parse a document not saved to the disk, e.g., from the stdin or grpc message payload. This is currently used by the tlapm lsp.
Also, it is unclear to me what happens to the expression levels when definitions in an expression are expanded. Don't we need to recalculate the levels? Does the SANY parser handle that?
Also, it is unclear to me what happens to the expression levels when definitions in an expression are expanded. Don't we need to recalculate the levels? Does the SANY parser handle that?
Do you mean operators? Basically how SANY handles that is if you have an operator like:
prime(x) == x'
then it puts a constraint on the parameter x so that the maximum level of expression you can supply for it is a variable-level expression. Giving an action-level expression for x (which would result in a double-prime) results in error 4205 "operator level constraints exceeded".
There's an even more complicated analysis that needs to be done for higher-order operators, as in operators accepting other operators as parameters. Error 4272 "higher order operator parameter level constraint not met" involves the level checker defining minimum levels for parameters. Here's an input that would trigger it:
VARIABLE v
op(F(_)) == F(v')
op2 == op(')
This one is a bit confusing. Basically when higher-order operators are used, the level-checker needs to determine the max/min levels of arguments they can accept. It does this by looking at how the higher-order operators are used. In this case, F(_) is used with an action-level argument of F(v'). This means any operator provided in place of F when calling op must be able to accept an action-level argument. While we are used to upper bounds when dealing with level-checking, this is a lower bound on an upper bound, which is a bit odd. So when op(') is called, since the ' operator cannot accept an action-level argument, it fails to meet F's lower bound on the argument's upper bound, thus producing a level-checking error. This correctly prevents what would otherwise allow us to double-prime an expression.
You can see a formal spec of the level checking process here.
SANY also runs a complementary analysis trying to ensure formulas are stuttering-insensitive. This is incomplete, as documented in bug https://github.com/tlaplus/tlaplus/issues/1160
There's an even more complicated analysis that needs to be done for higher-order operators
Nit: TLA+ is a first-order logic. It is not higher-order, where operators can be treated as values. You may pass an operator to another operator, but you e.g. cannot quantify over operators.
@lemmy good point. What do you think is a good term for the parameter of these not-higher-order-ops? I've seen "operator parameter parameter" a few times in the codebase which I don't really like.
@lemmy good point. What do you think is a good term for the parameter of these not-higher-order-ops? I've seen "operator parameter parameter" a few times in the codebase which I don't really like.
I'm not aware a dedicated terminology.