coq-serapi Enable Async support in SerAPI.

SerAPI should support the async proof mode. Experimental support is already in place, however more testing is needed. Remaining tasks are:

[ ] Teach our internal document mode of focus operations.
[x] Document better Coq's upstream API / Flags for async, check we are using the API correctly.
[x] Fix our handling of command line parameters [needs upstream help]
[x] Provide worker control commands in the Stm* family.

Notes will be taken in notes/async-support.md

Jun 14 '16 19:06 ejgallego

a5b5bf9d5755787f9d22f1f8794f0ae94423b3d6 added some preliminary flags.

Jun 14 '16 19:06 ejgallego

I don't quite understand it either.

There seem to be many flags:

https://github.com/coq/coq/blob/trunk/toplevel/coqtop.ml#L493

-async-proofs on seems like the first step.

Jun 14 '16 20:06 Ptival

Yeah I tried to imitate that, but I'm kind of doing things blindly here :dancer:

Jun 14 '16 20:06 ejgallego

This script https://github.com/coq/coq/blob/trunk/test-suite/interactive/proof_block.v seems to be processed asynchronously now. YMMV.

Jun 15 '16 09:06 ejgallego

Ok I did some more changes and arrived to this:

(Control (StmObserve 22))
....
sertop: unknown option `-m'.
Usage: sertop [OPTION]... 
Try `sertop --help' for more information.
(Feedback ...
     "Anomaly: Uncaught exception (Failure \"The spawned process did not connect back in 2.0s\").\
    \nPlease report.")))

So indeed this looks promising! We just need to instruct the Stm not to call sertop but the regular coqtop for child processes.

Jun 15 '16 09:06 ejgallego

Hey Emilio,

Pretty cool :) Things seem to be going very nicely. Question: after compiling the latest build and passing async, I can see states being processed in parallel, but I'm not getting feedback for states inside of a proof. Here is an example of a transcript:

(add-ltac (Control (StmAdd 1 None "Ltac slow := do 1000 (do 1000 idtac).")))
  (Answer add-ltac Ack)
  (Answer add-ltac
          (StmAdded 3
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 37))
                    NewTip))
  (Answer add-ltac Completed)
(add-lemma (Control (StmAdd 2 (Some 3) "Lemma a : True. Proof.")))
  (Answer add-lemma Ack)
  (Answer add-lemma
          (StmAdded 4
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 15))
                    NewTip))
  (Answer add-lemma
          (StmAdded 5
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 16)
                     (ep 22))
                    NewTip))
  (Answer add-lemma Completed)
(add-proof (Control (StmAdd 4 (Some 5) "slow. slow. slow. auto.")))
  (Answer add-proof Ack)
  (Answer add-proof
          (StmAdded 6
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 5))
                    NewTip))
  (Answer add-proof
          (StmAdded 7
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 6)
                     (ep 11))
                    NewTip))
  (Answer add-proof
          (StmAdded 8
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 12)
                     (ep 17))
                    NewTip))
  (Answer add-proof
          (StmAdded 9
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 18)
                     (ep 23))
                    NewTip))
  (Answer add-proof Completed)
(add-qed (Control (StmAdd 2 (Some 9) "Qed.")))
  (Answer add-qed Ack)
  (Answer add-qed
          (StmAdded 10
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 4))
                    NewTip))
  (Answer add-qed Completed)
(Control (StmObserve 10))
  (Answer 4 Ack)
  (Feedback
   ((id
     (State 10))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 2))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents
     (Message Info
              (Element
               (_ nil
                  ((PCData "slow is defined"))))))
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents
     (Message Info
              (Element
               (_ nil
                  ((PCData "a is defined"))))))
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents Incomplete)
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents Processed)
    (route 0)))
  (Answer 4 Completed)

Although I get a message from state 10 saying that a is defined, I don't get anything for states 6, 7, 8, 9 (the ones inside the Proof .. Qed block). Am I doing something wrong?

Jun 15 '16 15:06 cpitclaudel

Data point: apparently it works if I call observe separately on the last state of each proof.

Jun 15 '16 15:06 cpitclaudel

Cool ! :) Here we are walking a bit on unexplored territory, it will be hard to figure things out if we cannot get @gares on board :)

A guess, does using the Join method change things?

Jun 15 '16 15:06 ejgallego

Also a good reference is to use CoqIde an see how it behaves. The mapping roughly is:

Go to point icon: Add the document up to point, does Observe in the tip of the Stm.
Gears icon: does Join on the document.

Jun 15 '16 15:06 ejgallego

Ok; let me try join :) Can you reproduce the thing I'm seeing in the example above btw?

Jun 15 '16 15:06 cpitclaudel

I see exactly what you see. Plus if I do join:

(Control StmJoin)
(Answer 5 Ack)
(Feedback ((id (State 10)) (contents Processed) (route 0)))
(Feedback ((id (State 9)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 8)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 7)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 6)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 5)) (contents Processed) (route 0)))
(Feedback ((id (State 6)) (contents Processed) (route 0)))
(Feedback ((id (State 7)) (contents Processed) (route 0)))
(Feedback ((id (State 8)) (contents Processed) (route 0)))
(Feedback ((id (State 9)) (contents Processed) (route 0)))
(Feedback ((id (State 10)) (contents Complete) (route 0)))
(Answer 5 Completed)

Jun 15 '16 16:06 ejgallego

Cool, thanks. Implemented Join in https://github.com/cpitclaudel/elcoq/commit/17c73f175ad7a5f027fddb3bee7a7cee95db0a40. It seems to only process the last proof though (that is, if there are, say, 10 closed proofs, it just observes the last closed one).

https://asciinema.org/a/cqmk80fnyfn26xlvcmu9ttf61

Jun 15 '16 20:06 cpitclaudel

Great, I'll have a look!

Jun 15 '16 20:06 ejgallego

Wow it is beautiful !!! wrt to the join, indeed when you join the Stm goes backwards on the document (I have no clue if there is any reason for that), the moment it finds an error, it stops.

This is due to join checking the whole document I think, it really checks that the proofs the worker generated are right and this needs to be done somehow in a single threaded fashion to still ensure consistency wrt the kernel.

Jun 15 '16 21:06 ejgallego

I guess that it depends on the interpretation we do of the feedback messages, I see in this example you get a few Incomplete messages. This may indicate that the STM is being very very lazy....

But somehow I'm seeing now a different parallelism than before. I wonder why.

Jun 15 '16 21:06 ejgallego

Wow it is beautiful !!!

Thanks :blush:

Jun 15 '16 22:06 cpitclaudel

Now workers are created again, there was a problem with the ordering of flag-setting. I've chosen a conservative flag setup: SerAPI imitates CoqIDE. Feel free to play thou with the flags in sertop_init.ml and in particular with setting async_proofs_full back to true.

Jun 15 '16 23:06 ejgallego

Note however that there is a race condition in printing. I have no clue how to the workers send the feedback messages so I'll have to look into it!

Jun 15 '16 23:06 ejgallego

Did you use a mutex as CoqIDE? https://github.com/coq/coq/blob/trunk/ide/ide_slave.ml#L467

Jun 20 '16 15:06 gares

Sorry, I did not see the commit. IIRC It is needed because the Stm uses 1 thread per worker, and feedback coming from workers is forwarded. A thread may be interrupted while sending a XML tree, hence you need the mutex. The feeder code is called by the thread.

Jun 20 '16 15:06 gares

Now that I think about it, I guess the proper place for the mutex is Pp.feedback ...

Jun 20 '16 15:06 gares

@gares IMHO it is not so bad to require clients to assume a re-entrant callback, but it should be documented of course.

Jun 20 '16 15:06 ejgallego

Catching up with this thread. Stm.finish is like observing the tip (what the "satus false" XML message does, IIRC); while Stm.join also joins the environment ("status true" in XML) that means all proof terms produced by workers are type-checked by the kernel of main Coqtop and all universe constraints are also checked. A worker typechecks in his corner the proof term, but 1) I don't want to trust workers, 2) only the main coqtop has a complete view and can check universes (a worker alone can only check satisfiability locally to his constraints).

Both calls are blocking, but the latter waits for a complete checking of everything, while the former does not wait for delegated proofs (as GoTo-Point in CoqIDE).

Jun 20 '16 17:06 gares

Thanks Enrico! Note that Stm.join seems broken for me when using --async_full. Are we missing something there?

Jun 20 '16 18:06 ejgallego

I think it is untested, since Coqoon has no join buttun. But should work (I see no if async_proofs_full in the code path). Broken in which way.

Jun 20 '16 18:06 gares

If I do:

$ rlwrap ./sertop.native --human --prelude ~/external/coq-git --async ~/external/coq-git/bin/coqtop --async-full
(Control (StmAdd 100 None "
Ltac slow := do 1000 (do 1000 idtac).
Ltac wrong := intro.

Lemma a: True.
Proof. slow. slow. slow. slow. slow. slow. auto. Qed.

Lemma b: True.
Proof. slow. slow. slow. auto. Qed.

Lemma c: True.
Proof. slow. slow. slow. auto. Qed.

Lemma d: True.
Proof. slow. slow. slow.
"))
(Control StmJoin)

master never responds again, it seems hung. Of course, it could well be a SerAPI bug.

Jun 20 '16 19:06 ejgallego

Hum, I guess you need to observe the tip first... I can't say why now, I've to check.

Jun 20 '16 19:06 gares

I guess join crawls the document for futures, but nobody scheduled them... observe does.

Jun 20 '16 19:06 gares

It is a combination CoqIDE does not let you do, since adds are always followed by observe.

Jun 20 '16 19:06 gares

We do an observe first, but something seems to be going on... We need to investigate more. Somehow with our --async-full we are getting very strange things.

It is a combination CoqIDE does not let you do, since adds are always followed by observe.

I see, thanks, we do n adds one observe; with usual CoqIDE mode it seems to work well.

Jun 20 '16 20:06 ejgallego

By the way, I'd swear that I witnessed a race condition just now: somehow, after an observe that triggered async checking, a worker produced a spurious error! Interesting!

Jun 20 '16 22:06 ejgallego

coq-serapi coq-serapi copied to clipboard

Enable Async support in SerAPI.

coq-serapi
coq-serapi copied to clipboard