coq-tools Tooling integration?

Hi, I am opening this issue to discuss some potential integration of these tools with some other existing tooling that could improve maintenance and functionality.

In particular, I think that we have made progress on some other parts of the Coq tooling, such as SerAPI and Flèche (upcoming incremental checking engine) as for it to make sense.

Some possibilities are:

incremental checking could greatly reduce the time needed to run the bu minimizer
next SerAPI version will provide a Python API which will allow for example to access and manipulate Coq AST easily.

Mar 30 '21 18:03 ejgallego

Indeed, I think it would be cool to be able to replace interaction with Coq with SerAPI (though we support all the way back to Coq 8.5pl3, so perahps only for newer versions of Coq). The main entry points are probably:

Parsing a file into statements: https://github.com/JasonGross/coq-tools/blob/82eb94259ac4bae7d053611a35322bd036cef86c/split_file.py#L135-L156
Splitting up statements into blocks that have to be removed all at once (e.g., we want to remove the entire Lemma ... Proof... Qed. block all at once, or not at all, we can't remove it statement by statement): https://github.com/JasonGross/coq-tools/blob/master/split_definitions.py#L57-L59
The part where we ask about the Coq output for a given file; it would be nice to cache state ids or something so that we can resume from whatever initial fragment has been already processed: https://github.com/JasonGross/coq-tools/blob/82eb94259ac4bae7d053611a35322bd036cef86c/diagnose_error.py#L176-L198

Mar 31 '21 15:03 JasonGross

@zimmi48 's comment here reminded me of this issue, actually the current pyCoq API seems to have reached the point we could try to go ahead with this.

I'd be happy to help. Support tho is gonna be limited to 8.13 / 8.14 for now, unless we get considerable more manpower.

Another issue is that for sure tools such as the minimizer will require quite a bit of API adaptations in the pyCoq side, so IMHO we should be free to develop a bit before declaring a "stable" status. On the good thing there is a lot to gain from porting the tools to the python API, including huge speedups due to incremental document processing.

Thus, maybe I'd go the route of feature-freezing the cmdline-based , up to 8.5 version of the tools, and have a -next branch that could add the new features.

Nov 12 '21 14:11 ejgallego

Thus, maybe I'd go the route of feature-freezing the cmdline-based , up to 8.5 version of the tools, and have a -next branch that could add the new features.

I'm happy to have a development branch where we switch entirely to the new API, but my inclination for master is to gate the features behind presence of the relevant bindings, i.e., test if they're available at runtime and decide what to do based on that. All the old ways of doing things can be renamed "legacy" or something.

including huge speedups due to incremental document processing.

This sounds really great! I think the interface I want to have for the higher level bits is that of caching: the lower levels should diff the document being requested against what's available, cancel it backtrack to the shared root, and then evaluate the new things.

Nov 13 '21 09:11 JasonGross

I'm happy to have a development branch where we switch entirely to the new API, but my inclination for master is to gate the features behind presence of the relevant bindings, i.e., test if they're available at runtime and decide what to do based on that. All the old ways of doing things can be renamed "legacy" or something.

Indeed that sounds reasonable, tho it could be maybe too messy in the medium term? For example, I'm adding the get_document_structured method that will return a document which is not linear but where sections etc... are delimited, so for example how would the rest of the algorithm work over such data? If the "legacy" way doesn't really compute the structure in the same way we may have a mess.

I think the interface I want to have for the higher level bits is that of caching: the lower levels should diff the document being requested against what's available, cancel it backtrack to the shared root, and then evaluate the new things.

You mean you want to submit the diff to pyCoq directly? That'd be fine, but I'm not sure if that's what you mean here.

The current interface is a bit more primitive, you get ids for sentences (and sections / modules now with the structured doc) , you can invalidate one, then Coq will send you the list of things that are also invalidated due to deps, then you can add.

It is OK to have a diff-based interface as we need one anyways for LSP.

Nov 13 '21 11:11 ejgallego

You mean you want to submit the diff to pyCoq directly? That'd be fine, but I'm not sure if that's what you mean here.

No, I mean have a file in this repo that presents that interface on top of PyCoq

so for example how would the rest of the algorithm work over such data?

Either I'll make the existing thing match the structured representation, or we have two versions of minimize_file (which takes the file name), and dispatch on legacy vs not. All the glob file handling has to stay in anyway, because I imagine pyCoq can't absolutize the Requires in a(n installed) file if I can't tell it what arguments it's supposed to pass to Coq.

Another alternative is to leave minimize_file alone for starters, and start by adjusting the "get the error message of this text" function to make use of the incremental compilation when it's available. I imagine that interacting with the structured document representation (to delete or replace some commands) and getting feedback from Coq should be mostly independent?

Nov 13 '21 14:11 JasonGross

No, I mean have a file in this repo that presents that interface on top of PyCoq

pyCoq will expose that interface anyways so up to coq-tools

All the glob file handling has to stay in anyway, because I imagine pyCoq can't absolutize the Requires in a(n installed) file if I can't tell it what arguments it's supposed to pass to Coq.

Not sure what the exact spec of the operation you are asking here is, but seems to me that pyCoq should have access to that information, after all require does know it?

I imagine that interacting with the structured document representation (to delete or replace some commands) and getting feedback from Coq should be mostly independent?

Almost, in the sense that when updating the document you also get some feedback, for example, parsing errors, etc.... If parsing is successful, then usually type checking / tactic running is a separate operation.

Nov 13 '21 15:11 ejgallego

Not sure what the exact spec of the operation you are asking here is, but seems to me that pyCoq should have access to that information, after all require does know it?

Require does not know it. Suppose a file foo.v has a line Require Import bar.. I need to know, for each Require statement, what the full path of the library being required is. This information is stored in installed .glob files but not .vo files. I know where foo.v was installed, but I don't know the command line used to compile it.

Nov 13 '21 21:11 JasonGross

Require does not know it. Suppose a file foo.v has a line Require Import bar.. I need to know, for each Require statement, what the full path of the library being required is.

Not sure I follow, but .vo files do know the right path, otherwise how would Coq load the file in the first place? Note that the loading is transitive so all the deps are actually needed [moreover you can get that in the feedback stream now but it requires to actually execute the require [see for example the output of the message buffer when you start jsCoq.

Nov 14 '21 19:11 ejgallego

Let me be a bit more concrete with a real-world example: in https://github.com/coq/coq/pull/14788#issuecomment-901071858, when minimizing the file vst/veric/mem_lessdef.v, the minimizer is unable to inline Flocq.IEEE754.Bits (at Require Flocq.IEEE754.Bits.). The reason is that we have the files user-contrib/Flocq/IEEE754/Bits.vo and user-contrib/Flocq/IEEE754/Bits.v but not user-contrib/Flocq/IEEE754/Bits.glob. The file user-contrib/Flocq/IEEE754/Bits.v at line 24 has Require Import Core BinarySingleNaN Binary.. Although the .vo files load modules recursively, and hence I can get the list of all Core.vo files loaded by Flocq.IEEE754.Bits from the .vo file, there's no guarantee that this list is singleton, and there's no way to resolve for sure which Core module is pointed at by Require Import Core. (Moreover, due to side-effects of Require, there's no guarantee that sticking Require <list of all dependencies> at the top and then just removing Require from the .v file will result in a file that works correctly.) (Additionally, I don't have access to the command line used to generate Bits.vo, because I download the artifacts and install them.)

Nov 14 '21 20:11 JasonGross

Thanks for the example @JasonGross , I will look into it with more detail later on. But I'm positive that we can get this info by other means, in particular pyCoq and SerAPI can spawn a full Coq instance to speculatively execute stuff in the worst case. A couple of comments from the top of my head:

Although the .vo files load modules recursively, and hence I can get the list of all Core.vo files loaded by Flocq.IEEE754.Bits from the .vo file, there's no guarantee that this list is singleton, and there's no way to resolve for sure which Core module is pointed at by Require Import Core.

Actually the info is in the .vo, not only the absolute path, but the md5 sum . So maybe what you are asking is how to resolve the source file of a .vo file? This indeed needs some more work, but shouldn't be too hard to fix. Recent Coq versions do store this info, but unfortunately inside locations I think. That's not an easy problem in general as it amount to setting a full debug env in other languages, which is brittle.

Note that I don't have a problem with the .glob files, but I think they are a bit outdated for some stuff and this info is better placed inside the core structs.

Nov 14 '21 21:11 ejgallego

But I'm positive that we can get this info by other means, in particular pyCoq and SerAPI can spawn a full Coq instance to speculatively execute stuff in the worst case. A couple of comments from the top of my head:

But that's just the point, we don't know what -R and -Q flags to pass to the instance of Coq doing speculative execution.

Actually the info is in the .vo, not only the absolute path, but the md5 sum

The .vo file remembers which line numbers and character positions each Require statement was issued at? (Does it remember the entire document source of the .v file, such that if I have the .vo file, I can reprint the .v file without having access to it?)

If not, consider making the following file in the HoTT repo:

Require Import Core.
Locate Module Core.
(* Module HoTT.Truncations.Core
Module HoTT.WildCat.Core (shorter name to refer to it in current context is WildCat.Core)
*)
Check Tr.
Require NaturalTransformation.Core.
Locate Module Core.
(* Module HoTT.Categories.NaturalTransformation.Core
Module HoTT.Categories.Category.Core (shorter name to refer to it in current context is Category.Core)
Module HoTT.Categories.Functor.Core (shorter name to refer to it in current context is Functor.Core)
Module HoTT.Truncations.Core (shorter name to refer to it in current context is Truncations.Core)
Module HoTT.WildCat.Core (shorter name to refer to it in current context is WildCat.Core) *)
Import Core.
Check NaturalTransformation.

If this file gets installed, and I want to inline it, somehow I need to know that the first Require Import Core. is referring to HoTT.Truncations.Core and not any other Core module, and I need to know that the Import Core on the second-to-last-line is referring to NaturalTransformation.Core and not any other Core module. (Granted, the second is not required if I can be sure to issue Requires in exactly the same order and on exactly the same lines as the original file.)

For HoTT, I could in fact just do -R user-contrib/HoTT HoTT and this should work fine for re-running the .v file. But if the original file was built with many -R flags binding the module paths sharing the same prefix to many directories, then the meaning of Require foo is going to depend on the ordering of the -R flags, right? And I have no way to reconstruct this ordering, since the flags (AFAIK) are not stored inside the .vo file.

Nov 14 '21 22:11 JasonGross

The .vo file remembers which line numbers and character positions each Require statement was issued at? (Does it remember the entire document source of the .v file, such that if I have the .vo file, I can reprint the .v file without having access to it?)

Indeed that's what I meant, we don't have a very good way to map .vo to .v files, the rest should be workable once we fix that.

And I have no way to reconstruct this ordering, since the flags (AFAIK) are not stored inside the .vo file.

Now that you mention that, we hope to make this kind of situation (dependency on path ordering) and error, it may even be the case already in master.

Nov 15 '21 11:11 ejgallego

I think we should also store inside the .vo file any relevant flags that were used to build the file, such as the warnings settings, and -set arguments, etc.

Nov 15 '21 20:11 JasonGross

Here is an update on the situation the way I see it: we now have coq-lsp and a python coq-lsp client that could help with a few of the topics discussed in this issue:

#87 : is solved by coq-lsp as you just send an updated file and coq-lsp will determine what to re-execute
related to #102 , coq-lsp supports projects, it would be possible to have a meta-format as requested, see below
also coq-lsp can provide information about the structure, and programable error-recovery heuristics
coq-lsp can be used without the lsp protocol, there is the command-line compiler fcc

The meta-format for working with several files would be something like:

# start_file: bar/foo.v
Definition a := 3.
# end_file
# start_file: foo/moo.v
Require Import bar.foo.
Print a.
# end_file

Oct 04 '23 15:10 ejgallego

Is fcc's incremental compilation support able to start up from vo files that were compiled with coqc?

Oct 04 '23 18:10 Zimmi48

@Zimmi48 yes, .vo files are fully compatible; something where coqc and fcc will differ (by default) is that fcc will for example compile this:

Lemma foo: False.
Qed.

to

Lemma foo: False.
Admitted.

also this can happen if we tell Flèche to skip proofs, etc...

It is very easy to disable that kind of behavior for fcc if desired, we can add something like --error_recovery=stop, etc... I guess it is still not clear what kind of use cases people may have in terms of different error handling; all of the informal data I've gathered was in the context of interactive use.

Oct 04 '23 22:10 ejgallego

Is fcc's incremental compilation support able to start up from vo files that were compiled with coqc?

Actually if I understand your question differently, in the sense of "can fcc pre-load the cache from a .vo file" the answer is "no".

I tried to do something similar and in coq-lsp TODO file there is a detailed writeup, but the short story is that .vo files depend too much of memory locations etc... as for us to being able to reconstruct the fast equality used today for memoization.

This should be solvable, but would require quite a bit work IMHO; problem is that Marshall is very efficient in terms of sharing and that's hard to imitate with other serialziation approaches.

But indeed a goal of fcc / coq-lsp is that indeed the cache can be dumped and resumed in a different session, it just turned out too hard to implement for now.

Oct 04 '23 22:10 ejgallego