coq icon indicating copy to clipboard operation
coq copied to clipboard

[experiment] Separate parsing execution

Open maximedenes opened this issue 3 years ago • 11 comments

This is an experimental branch, not to be merged directly.

maximedenes avatar Mar 18 '22 12:03 maximedenes

:red_circle: CI failures at commit 7e90e22902a2e198404d6e097c98621f38e1d378 without any failure in the test-suite

:heavy_check_mark: Corresponding jobs for the base commit 88475d53e0a42c5e1ec58fccd3ab23868e53483b succeeded

:grey_question: Ask me to try to extract minimal test cases that can be added to the test-suite

:runner: @coqbot ci minimize will minimize the following targets: ci-metacoq, ci-paramcoq
  • You can also pass me a specific list of targets to minimize as arguments.

coqbot-app[bot] avatar Apr 16 '22 11:04 coqbot-app[bot]

:red_circle: CI failures at commit 7e90e22902a2e198404d6e097c98621f38e1d378 without any failure in the test-suite

:heavy_check_mark: Corresponding jobs for the base commit 88475d53e0a42c5e1ec58fccd3ab23868e53483b succeeded

:grey_question: Ask me to try to extract minimal test cases that can be added to the test-suite

:runner: @coqbot ci minimize will minimize the following targets: ci-metacoq, ci-paramcoq
  • You can also pass me a specific list of targets to minimize as arguments.

coqbot-app[bot] avatar Apr 16 '22 13:04 coqbot-app[bot]

The "needs: rebase" label was set more than 30 days ago. If the PR is not rebased in 30 days, it will be automatically closed.

coqbot-app[bot] avatar Aug 25 '22 02:08 coqbot-app[bot]

@ejgallego I don't see anything in your comment that is not addressed in https://github.com/coq/coq/issues/15409 or in previous discussions we had.

The separation between parsing and execution has pros and cons: being able to reason on ASTs rather than plain text, while not incurring a high cost like re-execution of commands, is very convenient when developing tools such as linters, documentation generators or document managers. On the other hand, it reflects a layered architecture that indeed prevents e.g. dependency of parsing on typing.

It turns out that these restrictions are already satisfied with the Coq system as it is. So in a way we get the cons, but not the pros. The point of this line of work is to improve the (short term) engineering of the current system in this respect.

Now, to my eyes, this is compatible with the (long term) research work that you are doing on different approaches to user interfaces for proof assistants. I find it interesting, even though I don't know much about it, since you preferred not to give me access to material presenting the work (because it was too experimental, if I understood correctly).

Given that we discussed this many times, that it is currently my main line of work on Coq, and that there seems to be a lot of expectations around VsCoq (expressed by people inside and outside Inria), I would prefer that we get rather quickly a final decision on whether we want to pursue it or not (especially as we are about to onboard more Inria engineers on this effort). Since it looks like a strategic decision (improving the engineering of the current system vs waiting for more bleeding edge approaches to converge), I'd ask @mattam82 what he thinks.

Note that I'm absolutely fine if we go either way. It's rather easy to find other topics for engineering effort in the Coq backlog.

maximedenes avatar Sep 03 '22 09:09 maximedenes

I’ve been summoned to help on this issue, but there are a few things I don’t understand here:

“it goes against bleeding-edge proposals such as enabling the parser to use type information, réactive elaboration, etc...”

Do you see it as the (future) job of the parser component to call typing primitives? Even in Arthur’s last proposals re overloading I don’t think that was required, but I might be missing something. Interpretation of notations on the other hand will certainly need it. I think with parsing and execution being split we would still have all freedom to choose the level of “interpretation” that parsing does, and what is left to the rest of elaboration / later phases. I’m also not sure what reactive elaboration feature you are envisioning that would be prevented by splitting.

Re the purpose of this PR, and the proper state handling issue, my intuition tells me it can be helpful to reify (and hence carve out and specify) the part of the state that is required to perform parsing, doesn’t that go in the right direction? At least that allows you to get syntax errors fast and give a slightly more structured Ast to work from that still would not need to be welltyped/interpretable, but from which some information could be extracted still, like names being used.

mattam82 avatar Sep 05 '22 16:09 mattam82

I wouldn't call making parsing depend on typing "bleeding edge". TeX has been doing that for 50 years.

ppedrot avatar Sep 05 '22 16:09 ppedrot

@ejgallego I don't see anything in your comment that is not addressed in https://github.com/coq/coq/issues/15409 or in previous discussions we had.

Maybe you would prefer feedback on that issue? I'm unsure what's the best place, for now I respond here. As it is hard to discuss this on text, I'll mark questions with Q

I wrote:

a CEP or design sketch of how this highly non-trivial change is planned to be implemented has not been shown

I take that back, the issue is clear enough. What my brain was thinking was:

Q: Why is this critical to have a short-term working document manager?

So indeed, what I guess I'm not understanding is why the DM needs this large and costly refactoring effort, instead of something more granular.

The separation between parsing and execution has pros and cons: being able to reason on ASTs rather than plain text

Q: what kind of reasoning does a DM require on AST? Isn't that the case the main feedback a DM sends to the user is the goals?

I think that separation of parsing and execution is a good thing to have, in the sense that a command that doesn't update the parsing state should have that reflected in its interpretation type.

What I believe to be quite costly to implement, rigid, and not very useful in practice is a full phase separation as outlined in the current design.

I suggest: implement this separation only partially (by making the command interpretation more dynamic) and you get much less development effort and still a lot of the advantages. I've outlined how to do this quickly on other PRs.

while not incurring a high cost like re-execution of commands

Q: how is re-execution of commands related to parsing separation? Q: What is the criteria you have in mind for determining whether a command should be re-exec'd?

coq-lsp tries very hard to avoid command re-execution, and provides good instrumentation to track cache misses, but still it is not clear what the most costly misses are (we need user data). I've never seen a hotspot in parsing.

On the other hand, as I suggested back in 2019, reifycation of the parsing state is a constant hotspot and impacts latency big time, moreover with the bug witnessed by #16281

Then you add a Notation or an Import command in the middle of the file. Suddenly, editing anything above the command requires re-executing before you can work again on the bottom of the document (or at least get useful feedback from the IDE, in a non-blocking model).

Q: Why do we need to parse before working on the bottom of the document?

Only thing I can think of is to report parsing errors eagerly? If so, you can make this work in 99% of the cases with a partial separation.

So this (in its current shape) seems like a lot of work for a very limited gain.

is very convenient when developing tools such as linters, documentation generators

This touches a point in which we already have some experience with, by people using serapi to do automatic name generation etc... and relying only in the parse tree. Indeed, in this case, SerAPI relies on the static analysis the STM does. Even with this rough static analysis you can produce the AST of large Coq developments very very quickly, and while I didn't do the exact statistics, I'm convinced that it would be only a marginal improvement in terms of efficiency (and likely just by separating one or two commands you will already get a negligibly amount of execution).

Also, I think it is risky to make such large efforts in terms of hypothetical linters or documentation generators. As far as I know, the need for less execution was very far on the list of wishlists for such tools implemented on top of SerAPI. The number one issue was actually having proper control over the (first) interpretation phase, and I agree this is a critical UI need.

It turns out that these restrictions are already satisfied with the Coq system as it is. So in a way we get the cons, but not the pros. The point of this line of work is to improve the (short term) engineering of the current system in this respect.

They are satisfied "implicitly" but not "explicitly". But indeed, I think my focus on experimental research was not the best one on this issue, as (thanks to Matthieu and Yves) I've missed that indeed the focus was on a short-term solution, which agree is essential. We should worry about the research when we get to that point. So I suggest we discard that concern for now, and focus on the main issue:

Q: why the full phase-separation (and large coding effort it implies) necessary for a short-term DM solution?

What's the concrete technical problem?

Now, to my eyes, this is compatible with the (long term) research work that you are doing on different approaches to user interfaces for proof assistants.

Yes, I think it is compatible, but maybe not the most efficient way to work.

I find it interesting, even though I don't know much about it, since you preferred not to give me access to material presenting the work (because it was too experimental, if I understood correctly).

The code / material I was talking about was obsolete, so I don't see the point in my investing time in searching for it, and I don't see how it would have been useful for you.

I agree it would have been nice to share that code when the code was not obsolete, and discuss and work together on it, why that didn't happen, that seems like a very good question to me.

Instead, I suggested that I would make public the new version, and so we did, the repos is available since June, and I have a new presentation ready, a CEP submitted, etc....

I'd be great to learn too about what the short term document manager plan and design is.

Q: Is there are design document / roadmap available for your DM effort?

Given that we discussed this many times, that it is currently my main line of work on Coq, and that there seems to be a lot of expectations around VsCoq (expressed by people inside and outside Inria), I would prefer that we get rather quickly a final decision on whether we want to pursue it or not (especially as we are about to onboard more Inria engineers on this effort).

I think the critical point here is to understand why the short-term document manager requires this and can't indeed work without this large PR, but I think it should be pretty quick to resolve, and I'm convinced we will find common ground.

ejgallego avatar Sep 13 '22 21:09 ejgallego

Q: Why is this critical to have a short-term working document manager?

It makes it convenient to reason on edits with ASTs at hand (as opposed to plain text) without needing to execute too much.

Q: how is re-execution of commands related to parsing separation?

If parsing and execution are mixed, getting the parsing state requires executing code (for example, you need to run full-fledged Import commands, instantiate functors, hence typecheck terms, etc. Each time a dependency of such a command is edited, you have to invalidate and re-execute that command.

Q: What is the criteria you have in mind for determining whether a command should be re-exec'd?

With a cache you can probably catch scenarios were after re-executing one or more commands, you realize that dynamic dependencies of parsing-affecting command are actually running in the same state, hence avoiding re-execution. I prefer a clean static phase separation, since the system already satisfies it.

Q: Why do we need to parse before working on the bottom of the document?

The user working on the bottom of the document typically wants the features of their IDE working there. Parsing makes it much easier to infer the structure of the document and do more work there, like typechecking while skipping proofs, etc.

Q: why the full phase-separation (and large coding effort it implies) necessary for a short-term DM solution?

I think I replied above. Note that this effort is already done anyway, we are converging.

Q: Is there are design document / roadmap available for your DM effort?

There is the VsCoq readmap here: https://github.com/coq-community/vscoq/wiki/VsCoq-1.0-Roadmap

As you already know, there is also an experimental document manager branch publicly available on the VsCoq repo.

I think the critical point here is to understand why the short-term document manager requires this and can't indeed work without this large PR, but I think it should be pretty quick to resolve, and I'm convinced we will find common ground.

Thanks, but since the core of the work is already done, and is anyway an improvement of the system in terms of engineering (cleaner phase separation), I think we can proceed with this line of work. If that reflects what @mattam82 and @ybertot want, of course.

maximedenes avatar Sep 13 '22 21:09 maximedenes

Q: Why do we need to parse before working on the bottom of the document?

The user working on the bottom of the document typically wants the features of their IDE working there.

Parsing makes it much easier to infer the structure of the document and do more work there, like typechecking while skipping proofs, etc.

Indeed inferring the structure of the document is a good use case, but my point is that we can do that without full phase separation.

Q: how is re-execution of commands related to parsing separation?

If parsing and execution are mixed, getting the parsing state requires executing code (for example, you need to run full-fledged Import commands, instantiate functors, hence typecheck terms, etc. Each time a dependency of such a command is edited, you have to invalidate and re-execute that command.

Yes, sorry I should have written "related to full parsing separation".

That's the same, I think a gradual approach where phase separation is dynamic is enough.

I prefer a clean static phase separation, since the system already satisfies it.

Well, if the system already satisfies it then my review is moot. I was proposing a gradual / dynamic approach because it is both more flexible and I think quicker.

Q: why the full phase-separation (and large coding effort it implies) necessary for a short-term DM solution?

I think I replied above. Note that this effort is already done anyway, we are converging.

Then indeed my review is moot, I was under the impression that significant work was still needed for this PR to be ready. When do you think this PR will be mergeable then?

Q: Is there are design document / roadmap available for your DM effort?

There is the VsCoq readmap here: https://github.com/coq-community/vscoq/wiki/VsCoq-1.0-Roadmap

As you already know, there is also an experimental document manager branch publicly available on the VsCoq repo.

Thanks, the roadmap is not very informative in the technical part tho. Without a bit more insight on what is the technical design of the DM, it is hard to know what to expect when PRs like this are submitted to Coq.

Writing a document manager for a system like Coq is not easy, the design space is pretty large.

IMO any effort on such direction should be properly validated and compared with current state-of-the-art in the area before large changes to Coq are proposed.

ejgallego avatar Sep 13 '22 22:09 ejgallego

Then indeed my review is moot, I was under the impression that significant work was still needed for this PR to be ready. When do you think this PR will be mergeable then?

The branch already has all the required features and passed the tests. What remains to be done is a refactoring, adding some API compatibility entry points and probably some overlays for some plugins. I estimate the remaining work to represent 1 week of work (not including the review phase). We have some constraints here until Sep 27th, so we can probably target one week after that.

Thanks, the roadmap is not very informative in the technical part tho. Without a bit more insight on what is the technical design of the DM, it is hard to know what to expect when PRs like this are submitted to Coq.

We will write more details once we make progress on the document manager POC (the VsCoq branch I was referring to). Meanwhile, I think separation of parsing and execution is of independent value anyway, for Coq.

IMO any effort on such direction should be properly validated and compared with current state-of-the-art in the area before large changes to Coq are proposed.

The VsCoq document manager is being implemented independently of Coq. Before discussing any integration of this DM in Coq (if judged relevant), a validation w.r.t. alternatives should obviously take place. But this is not what this PR is about.

I was proposing a gradual / dynamic approach because it is both more flexible and I think quicker.

I haven't thought much about it yet, but a dynamic approach could be interesting to express dependencies of phases that are interdependent, like name resolution and typing. When the phase is already essentially independent (like parsing), I prefer that we make the separation invariant statically explicit. You can see it as making the dependencies more fine-grained.

maximedenes avatar Sep 14 '22 06:09 maximedenes

We have some constraints here until Sep 27th, so we can probably target one week after that.

Decision still pending, so I'll shift the planning of this work to the beginning of October.

maximedenes avatar Sep 19 '22 09:09 maximedenes

First, I'd like to thank everyone for taking the time to talk with Yves and myself about this PR and the larger landscape of the work on UIs. My conclusion from this discussion on this particular PR is that this separation is beneficial and almost ready to go and will not be a blocker for other work on UIs, so @maximedenes you have the green light here and for its followups. On the other hand, this also shows that we need to setup a time for everyone to be on the same page (myself not the least) about the different efforts and directions to avoid potential roadblocks in the future, following up from the February meeting and the working group discussion. I'll help set this up on Zulip.

mattam82 avatar Sep 26 '22 14:09 mattam82

The VsCoq document manager is being implemented independently of Coq. Before discussing any integration of this DM in Coq (if judged relevant), a validation w.r.t. alternatives should obviously take place. But this is not what this PR is about.

The vsCoq DM may be an independent effort from Coq, but this PR touches key points of the Coq API, so it is pretty much a Coq matter, and in that sense, we must apply normal reviewing and quality procedures, in particular to avoid past problems and the introduction of technical debt and ill-designed code. Given the track record on this area, and in particular, past problems with engineering roadmap and user interfaces, I think we need to be specially careful here.

I don't believe (and all the experimental evidence that I know of agrees) that a state-of-the-art document manager can be developed independently of Coq, on the contrary, all similar efforts have required deep refactorings on the underlying system, but this is a point I already remarked on the UI roadmap I provided in 2019.

ejgallego avatar Oct 10 '22 14:10 ejgallego

we must apply normal reviewing and quality procedures

Are you suggesting it is not the case?

maximedenes avatar Oct 10 '22 15:10 maximedenes