coq-serapi [query] Support querying dependency data

Since the inception of SerAPI providing a way to query dependency data was a goal. In fact, this is possible to do now by clients and you can dump the actual proof term in sexp format and then perform the analysis yourself.

However, the very nice work on regression testing https://github.com/coq/coq/issues/9262 could indeed benefit from having specific support for dependency data. In particular, we may want to provide two calls:

(Query opts (Deps id)) : Output dependencies of identified id,
(Query opts (Digest id)): Output digest of id's body,
(Query opts (RangeDigest range)): Output digest + deps for identifiers in a document range [in icoq style]

The interface for updating a prior proof is not clear yet, but likely the easiest for now is to piggyback on the .vio interface.

There are many open questions about how to better do this, a start will be to embed the dep-tracking computation of https://github.com/Karmaki/coq-dpdgraph , and implement 1 and 2 so we can start experimenting.

cc: @palmskog

Dec 21 '18 23:12 ejgallego

This is a great start, but I will make some additions to the wish list of commands.

In regression proving, we make a distinction between the digest/dependencies for the type of an identifier, and those for its body. In particular, when an identifier id is opaque, another identifier that depends directly on id does not transitively depend on the dependencies in the body of id - only on the dependencies in the type of id. Intuitively, the rationale is that only a change in the type of id can make proofs that use id fail.

On that note, I propose the following additional commands:

(Query opts (TypeDeps id)): Output dependencies of the type of id
(Query opts (TypeDigest id)): Output digest of the type of id

Note in particular that for opaque identifiers in .vio files, the body of id cannot be obtained. This was one of the reasons I had to fork dpdgraph for iCoq.

Furthermore, we are interested in getting dependencies/digests for all identifiers in a specific module, but this could be subsumed in a range parameter.

Dec 22 '18 00:12 palmskog

Indeed maybe we should just have the query to return a struct:

type object_imprint =
  { digest : Digest.t
  ; deps : Digest.t list
  }

type definition_imprint =
  { type : object_imprint
  ; body : object_imprint option
  }

or something like that.

Regarding getting all the identifiers of a module we will need to use a hack :( :(

I have complained in https://github.com/coq/coq/issues/8495 and https://github.com/coq/coq/pull/8228 about this very strong regression [if we would like to use the environment to grab objects]. This is the kind of silly blockage that makes our life hard IMHO [the worst is that this change was accidentally introduced !!]

But indeed, the part of Coq that manages objects [libobject] is by far the one that poses the most problems for people willing to do the kind of work we are trying to do. Even Search is implemented in Coq using a hack, due to lack of an structured representation for .vo data :(

Dec 22 '18 00:12 ejgallego

I've given this some more thought now, and I believe we need kernel names (e.g., mathcomp.ssreflect.fintype.uniq_enumP) and not just digests for dependencies. Here is a revised proposal.

Data structures:

type const_deps =
  { type : Name.t list (* kernel names of type deps *)
  ; body : Name.t list option (* kernel names of body deps, if available *)
  }

type const_digest =
  { type : Digest.t (* canonical checksum of the type *)
  ; body : Digest.t option (* canonical checksum of the body, if available *)
  }

API calls:

(Query opts (Deps id)): returns const_deps for id
(Query opts (RangeDeps range)): returns list const_deps for range
(Query opts (Digest id)): returns const_digest for id
(Query opts (RangeDigest range)): returns list const_digest for range

Note that dependencies are assumed to be direct dependencies rather than the whole transitive closure (which usually cannot be obtained anyway from .vio files).

Dec 23 '18 18:12 palmskog

Here is an example to make things more concrete.

Files in namespace `ST`

ListUtil.v:

Require Import List. Import ListNotations.

Lemma remove_preserve : forall (A : Type) A_eq_dec (x y : A) xs,
  x <> y -> In y xs -> In y (remove A_eq_dec x xs).
Proof.
induction xs; simpl; intros.
- intuition.
- case A_eq_dec; intros.
  * apply IHxs; subst; intuition.
  * intuition; subst; left; auto.
Qed.

Lemma in_remove : forall (A : Type) A_eq_dec (x y : A) xs,
  In y (remove A_eq_dec x xs) -> In y xs.
Proof.
induction xs; simpl; intros; auto.
destruct A_eq_dec; simpl in *; intuition.
Qed.

Dedup.v:

Require Import List ListUtil. Import ListNotations.

Fixpoint dedup (A : Type) A_eq_dec (xs : list A) : list A :=
match xs with
| [] => []
| x :: xs =>
  if in_dec A_eq_dec x xs then dedup A A_eq_dec xs
  else x :: dedup A A_eq_dec xs
end.

Lemma remove_dedup : forall A A_eq_dec (x : A) xs,
  remove A_eq_dec x (dedup A A_eq_dec xs) =
  dedup A A_eq_dec (remove A_eq_dec x xs).
Proof.
induction xs; intros; auto; simpl.
repeat (try case in_dec; try case A_eq_dec; simpl; intuition); auto using f_equal.
- exfalso. apply n0. apply remove_preserve; auto.
- exfalso. apply n. apply in_remove in i; intuition.
Qed.

File dependencies

Dedup.v → ListUtil.v

Obtained by running coqdep on both files.

Identifier partitioning

ListUtil.v:
- ST.ListUtil.remove_preserve#type
- ST.ListUtil.remove_preserve#body (proof task)
- ST.ListUtil.in_remove#type
- ST.ListUtil.in_remove#body (proof task)
Dedup.v:
- ST.Dedup.dedup
- ST.Dedup.remove_dedup#type
- ST.Dedup.remove_dedup#body (proof task)

Obtained by quick-compiling both files in topological sort order and querying the .vio files.

Identifier dependencies

ST.ListUtil.remove_preserve#body → ST.ListUtil.remove_preserve#type
ST.ListUtil.in_remove#body → ST.ListUtil.in_remove#type
ST.Dedup.remove_dedup#type → ST.Dedup.remove
ST.Dedup.remove_dedup#body → ST.Dedup.remove_dedup#type
ST.Dedup.remove_dedup#body → ST.Dedup.remove
ST.Dedup.remove_dedup#body → ST.ListUtil.remove_preserve#type
ST.Dedup.remove_dedup#body → ST.ListUtil.in_remove#type

Obtained by querying the .vio files and checking proofs.

Effects of a change

Suppose we change ST.Dedup.dedup to the following equivalent definition:

Fixpoint dedup (A : Type) A_eq_dec (xs : list A) : list A :=
match xs with
| [] => []
| x :: xs =>
  let tail := dedup A A_eq_dec xs in
  if in_dec A_eq_dec x xs then tail else x :: tail
end.

We then perform the following steps:

compute digests of ListUtil.v and Dedup.v; mark the latter as modified and topologically sort
quick-compile ListUtil.v and Dedup.v into ListUtil.vio and Dedup.vio
compare digests to determine whether ST.Dedup.remove or ST.Dedup.remove_dedup#type are different; mark the former as modified
compute impacted proof tasks in the inverse of the identifier dependency graph above: ST.Dedup.remove_dedup#body
check the proof of ST.Dedup.remove_dedup using Dedup.vio

Dec 24 '18 20:12 palmskog

We came up with a JSON-based format for representing Coq dependency graphs at the file and proof levels. It may help understanding to show the above example in this format.

Old file-level dependency graph

[
        { "uri": "/home/user/projects/chip/examples/st/ListUtil.v",
          "id": 3955365859,
          "checkable": true,
          "neighbors": [],
          "contents": [3345550460,2461010471,3728870587,2163935719],
          "checksum": "60e8a717"
        },
        { "uri": "/home/user/projects/chip/examples/st/Dedup.v",
          "id": 2758678171,
          "checkable": true,
          "neighbors": [3955365859],
          "contents": [2433485337,591398152,2138507738],
          "checksum": "9ffbce4b"
        }
]

New file-level dependency graph

[
        { "uri": "/home/user/projects/chip/examples/st/ListUtil.v",
          "id": 3955365859,
          "checkable": true,
          "neighbors": [],
	  "contents": [3345550460,2461010471,3728870587,2163935719],
          "checksum": "60e8a717"
        },
        { "uri": "/home/user/projects/chip/examples/st/Dedup.v",
          "id": 2758678171,
          "neighbors": [3955365859],
	  "contents": [2433485337,591398152,2138507738],
          "checkable": true,
          "checksum":"62f7cfdd"
        }
]

Old proof-level dependency graph

 [
        { "uri": "ST.ListUtil.remove_preserve#type",
          "id": 3345550460,
          "checkable": false,
          "neighbors": [],
          "checksum": "69c406b0"
        },
        { "uri": "ST.ListUtil.remove_preserve#body",
          "id": 3728870587,
          "checkable": true,
          "neighbors": [3345550460],
          "checksum": "9e602d3e"
        }
        { "uri": "ST.ListUtil.in_remove#type",
          "id": 2163935719,
          "checkable": false,
          "neighbors": [],
          "checksum": "6a7806de"
        },
        { "uri": "ST.ListUtil.in_remove#body",
          "id": 2461010471,
          "checkable": true,
          "neighbors": [2163935719],
          "checksum": "30251b5b"
        },
        { "uri": "ST.Dedup.remove_dedup#type",
          "id": 2138507738,
          "checkable": false,
          "neighbors": [591398152],
          "checksum": "69ad06ac"
        },
        { "uri": "ST.Dedup.remove_dedup#body",
          "id": 2433485337,
          "checkable": true,
          "neighbors": [3345550460,591398152,2163935719,2138507738],
          "checksum": "dd804e41"
        },
        { "uri": "ST.Dedup.dedup",
          "id": 591398152,
          "checkable": false,
          "neighbors": [],
          "checksum": "6a3206c51d6103c5"
        }
]

New proof-level dependency graph

[
        { "uri": "ST.ListUtil.in_remove#type",
          "id":2163935719,
          "checkable": false,
          "checksum": "6a7806de"
        },
        { "uri": "ST.ListUtil.in_remove#body",
          "id": 2461010471,
          "checkable": true,
          "checksum": "30251b5b"
        },
        { "uri": "ST.ListUtil.remove_preserve#type",
          "id": 3345550460,
          "checkable": false,
          "checksum": "69c406b0"
        }
        { "uri": "ST.ListUtil.remove_preserve#body",
          "id": 3728870587,
          "checkable": true,
          "checksum": "9e602d3e"
        },
        { "uri": "ST.Dedup.remove_dedup#body",
          "id": 2433485337,
          "checkable": true,
          "checksum":"dd804e41"
        },
        { "uri": "ST.Dedup.remove_dedup#type",
          "id": 2138507738,
          "checkable": false,
          "checksum": "69ad06ac"
        },
        { "uri": "ST.Dedup.dedup",
          "id": 591398152,
          "checkable": false,
          "checksum": "6a3206c51de703dc"
        }
]

Expected output from topological sorting

/home/user/projects/chip/examples/st/ListUtil.v
/home/user/projects/chip/examples/st/Dedup.v

Expected output from selection

ST.Dedup.remove_dedup#body

Discussion

The checksum for /home/user/projects/chip/examples/st/Dedup.v is different in the new file-level graph.
The checksum for ST.Dedup.dedup is different in the new proof-level graph.
Dependencies from the Coq standard library have been filtered out.
There are no special requirements on identifiers (id field) except that they must be integers - for example, they could be generated from kernel names by hashing.
Topological sorting uses dependencies in the new file dependency graph, while the old file dependencies are used for selection.
All Coq files have checkable: true, since we can run coqc -quick on them.
Only the bodies of opaque constants have checkable: true, since they are the ones we can check in isolation (via coqc -check-vio-tasks).
We simply concatenate the type and body checksums for transparent constants like ST.Dedup.dedup; one could instead compute new a checksum of their concatenated checksums to get something of the same length.
We have an OCaml tool, extracted from verified Coq code, for doing this type of hierarchical selection and topological sorting given the information above.
We currently calculate file checksums on the raw .v files for selection; it may be better to calculate "smarter" checksums by, e.g., checksumming the sexprialized output of sercomp.

Feb 22 '19 20:02 palmskog

Very cool, thanks for the info.

Feb 22 '19 20:02 ejgallego

I plan to use https://github.com/janestreet/ppx_hash for this; ideally we'd get this in Coq 8.11 by the way.

Mar 20 '19 19:03 ejgallego

I mean, once Dune is the main build system of Coq, we can vendor ppx painlessly. So for example we could remove from serlib big chunks of [@@deriving sexp].

Mar 20 '19 19:03 ejgallego

coq-serapi coq-serapi copied to clipboard

[query] Support querying dependency data

Files in namespace ST

File dependencies

Identifier partitioning

Identifier dependencies

Effects of a change

Old file-level dependency graph

New file-level dependency graph

Old proof-level dependency graph

New proof-level dependency graph

Expected output from topological sorting

Expected output from selection

Discussion

coq-serapi
coq-serapi copied to clipboard

Files in namespace `ST`