coq-serapi icon indicating copy to clipboard operation
coq-serapi copied to clipboard

Enable Async support in SerAPI.

Open ejgallego opened this issue 8 years ago • 31 comments

SerAPI should support the async proof mode. Experimental support is already in place, however more testing is needed. Remaining tasks are:

  • [ ] Teach our internal document mode of focus operations.
  • [x] Document better Coq's upstream API / Flags for async, check we are using the API correctly.
  • [x] Fix our handling of command line parameters [needs upstream help]
  • [x] Provide worker control commands in the Stm* family.

Notes will be taken in notes/async-support.md

ejgallego avatar Jun 14 '16 19:06 ejgallego

a5b5bf9d5755787f9d22f1f8794f0ae94423b3d6 added some preliminary flags.

ejgallego avatar Jun 14 '16 19:06 ejgallego

I don't quite understand it either.

There seem to be many flags:

https://github.com/coq/coq/blob/trunk/toplevel/coqtop.ml#L493

-async-proofs on seems like the first step.

Ptival avatar Jun 14 '16 20:06 Ptival

Yeah I tried to imitate that, but I'm kind of doing things blindly here :dancer:

ejgallego avatar Jun 14 '16 20:06 ejgallego

This script https://github.com/coq/coq/blob/trunk/test-suite/interactive/proof_block.v seems to be processed asynchronously now. YMMV.

ejgallego avatar Jun 15 '16 09:06 ejgallego

Ok I did some more changes and arrived to this:

(Control (StmObserve 22))
....
sertop: unknown option `-m'.
Usage: sertop [OPTION]... 
Try `sertop --help' for more information.
(Feedback ...
     "Anomaly: Uncaught exception (Failure \"The spawned process did not connect back in 2.0s\").\
    \nPlease report.")))

So indeed this looks promising! We just need to instruct the Stm not to call sertop but the regular coqtop for child processes.

ejgallego avatar Jun 15 '16 09:06 ejgallego

Hey Emilio,

Pretty cool :) Things seem to be going very nicely. Question: after compiling the latest build and passing async, I can see states being processed in parallel, but I'm not getting feedback for states inside of a proof. Here is an example of a transcript:

(add-ltac (Control (StmAdd 1 None "Ltac slow := do 1000 (do 1000 idtac).")))
  (Answer add-ltac Ack)
  (Answer add-ltac
          (StmAdded 3
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 37))
                    NewTip))
  (Answer add-ltac Completed)
(add-lemma (Control (StmAdd 2 (Some 3) "Lemma a : True. Proof.")))
  (Answer add-lemma Ack)
  (Answer add-lemma
          (StmAdded 4
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 15))
                    NewTip))
  (Answer add-lemma
          (StmAdded 5
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 16)
                     (ep 22))
                    NewTip))
  (Answer add-lemma Completed)
(add-proof (Control (StmAdd 4 (Some 5) "slow. slow. slow. auto.")))
  (Answer add-proof Ack)
  (Answer add-proof
          (StmAdded 6
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 5))
                    NewTip))
  (Answer add-proof
          (StmAdded 7
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 6)
                     (ep 11))
                    NewTip))
  (Answer add-proof
          (StmAdded 8
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 12)
                     (ep 17))
                    NewTip))
  (Answer add-proof
          (StmAdded 9
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 18)
                     (ep 23))
                    NewTip))
  (Answer add-proof Completed)
(add-qed (Control (StmAdd 2 (Some 9) "Qed.")))
  (Answer add-qed Ack)
  (Answer add-qed
          (StmAdded 10
                    ((fname "")
                     (line_nb 1)
                     (bol_pos 0)
                     (line_nb_last 1)
                     (bol_pos_last 0)
                     (bp 0)
                     (ep 4))
                    NewTip))
  (Answer add-qed Completed)
(Control (StmObserve 10))
  (Answer 4 Ack)
  (Feedback
   ((id
     (State 10))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents
     (ProcessingIn master))
    (route 0)))
  (Feedback
   ((id
     (State 2))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents
     (Message Info
              (Element
               (_ nil
                  ((PCData "slow is defined"))))))
    (route 0)))
  (Feedback
   ((id
     (State 3))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 5))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 4))
    (contents Processed)
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents
     (Message Info
              (Element
               (_ nil
                  ((PCData "a is defined"))))))
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents Incomplete)
    (route 0)))
  (Feedback
   ((id
     (State 10))
    (contents Processed)
    (route 0)))
  (Answer 4 Completed)

Although I get a message from state 10 saying that a is defined, I don't get anything for states 6, 7, 8, 9 (the ones inside the Proof .. Qed block). Am I doing something wrong?

cpitclaudel avatar Jun 15 '16 15:06 cpitclaudel

Data point: apparently it works if I call observe separately on the last state of each proof.

cpitclaudel avatar Jun 15 '16 15:06 cpitclaudel

Cool ! :) Here we are walking a bit on unexplored territory, it will be hard to figure things out if we cannot get @gares on board :)

A guess, does using the Join method change things?

ejgallego avatar Jun 15 '16 15:06 ejgallego

Also a good reference is to use CoqIde an see how it behaves. The mapping roughly is:

  • Go to point icon: Add the document up to point, does Observe in the tip of the Stm.
  • Gears icon: does Join on the document.

ejgallego avatar Jun 15 '16 15:06 ejgallego

Ok; let me try join :) Can you reproduce the thing I'm seeing in the example above btw?

cpitclaudel avatar Jun 15 '16 15:06 cpitclaudel

I see exactly what you see. Plus if I do join:

(Control StmJoin)
(Answer 5 Ack)
(Feedback ((id (State 10)) (contents Processed) (route 0)))
(Feedback ((id (State 9)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 8)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 7)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 6)) (contents (ProcessingIn master)) (route 0)))
(Feedback ((id (State 5)) (contents Processed) (route 0)))
(Feedback ((id (State 6)) (contents Processed) (route 0)))
(Feedback ((id (State 7)) (contents Processed) (route 0)))
(Feedback ((id (State 8)) (contents Processed) (route 0)))
(Feedback ((id (State 9)) (contents Processed) (route 0)))
(Feedback ((id (State 10)) (contents Complete) (route 0)))
(Answer 5 Completed)

ejgallego avatar Jun 15 '16 16:06 ejgallego

Cool, thanks. Implemented Join in https://github.com/cpitclaudel/elcoq/commit/17c73f175ad7a5f027fddb3bee7a7cee95db0a40. It seems to only process the last proof though (that is, if there are, say, 10 closed proofs, it just observes the last closed one).

https://asciinema.org/a/cqmk80fnyfn26xlvcmu9ttf61

cpitclaudel avatar Jun 15 '16 20:06 cpitclaudel

Great, I'll have a look!

ejgallego avatar Jun 15 '16 20:06 ejgallego

Wow it is beautiful !!! wrt to the join, indeed when you join the Stm goes backwards on the document (I have no clue if there is any reason for that), the moment it finds an error, it stops.

This is due to join checking the whole document I think, it really checks that the proofs the worker generated are right and this needs to be done somehow in a single threaded fashion to still ensure consistency wrt the kernel.

ejgallego avatar Jun 15 '16 21:06 ejgallego

I guess that it depends on the interpretation we do of the feedback messages, I see in this example you get a few Incomplete messages. This may indicate that the STM is being very very lazy....

But somehow I'm seeing now a different parallelism than before. I wonder why.

ejgallego avatar Jun 15 '16 21:06 ejgallego

Wow it is beautiful !!!

Thanks :blush:

cpitclaudel avatar Jun 15 '16 22:06 cpitclaudel

Now workers are created again, there was a problem with the ordering of flag-setting. I've chosen a conservative flag setup: SerAPI imitates CoqIDE. Feel free to play thou with the flags in sertop_init.ml and in particular with setting async_proofs_full back to true.

ejgallego avatar Jun 15 '16 23:06 ejgallego

Note however that there is a race condition in printing. I have no clue how to the workers send the feedback messages so I'll have to look into it!

ejgallego avatar Jun 15 '16 23:06 ejgallego

Did you use a mutex as CoqIDE? https://github.com/coq/coq/blob/trunk/ide/ide_slave.ml#L467

gares avatar Jun 20 '16 15:06 gares

Sorry, I did not see the commit. IIRC It is needed because the Stm uses 1 thread per worker, and feedback coming from workers is forwarded. A thread may be interrupted while sending a XML tree, hence you need the mutex. The feeder code is called by the thread.

gares avatar Jun 20 '16 15:06 gares

Now that I think about it, I guess the proper place for the mutex is Pp.feedback ...

gares avatar Jun 20 '16 15:06 gares

@gares IMHO it is not so bad to require clients to assume a re-entrant callback, but it should be documented of course.

ejgallego avatar Jun 20 '16 15:06 ejgallego

Catching up with this thread. Stm.finish is like observing the tip (what the "satus false" XML message does, IIRC); while Stm.join also joins the environment ("status true" in XML) that means all proof terms produced by workers are type-checked by the kernel of main Coqtop and all universe constraints are also checked. A worker typechecks in his corner the proof term, but 1) I don't want to trust workers, 2) only the main coqtop has a complete view and can check universes (a worker alone can only check satisfiability locally to his constraints).

Both calls are blocking, but the latter waits for a complete checking of everything, while the former does not wait for delegated proofs (as GoTo-Point in CoqIDE).

gares avatar Jun 20 '16 17:06 gares

Thanks Enrico! Note that Stm.join seems broken for me when using --async_full. Are we missing something there?

ejgallego avatar Jun 20 '16 18:06 ejgallego

I think it is untested, since Coqoon has no join buttun. But should work (I see no if async_proofs_full in the code path). Broken in which way.

gares avatar Jun 20 '16 18:06 gares

If I do:

$ rlwrap ./sertop.native --human --prelude ~/external/coq-git --async ~/external/coq-git/bin/coqtop --async-full
(Control (StmAdd 100 None "
Ltac slow := do 1000 (do 1000 idtac).
Ltac wrong := intro.

Lemma a: True.
Proof. slow. slow. slow. slow. slow. slow. auto. Qed.

Lemma b: True.
Proof. slow. slow. slow. auto. Qed.

Lemma c: True.
Proof. slow. slow. slow. auto. Qed.

Lemma d: True.
Proof. slow. slow. slow.
"))
(Control StmJoin)

master never responds again, it seems hung. Of course, it could well be a SerAPI bug.

ejgallego avatar Jun 20 '16 19:06 ejgallego

Hum, I guess you need to observe the tip first... I can't say why now, I've to check.

gares avatar Jun 20 '16 19:06 gares

I guess join crawls the document for futures, but nobody scheduled them... observe does.

gares avatar Jun 20 '16 19:06 gares

It is a combination CoqIDE does not let you do, since adds are always followed by observe.

gares avatar Jun 20 '16 19:06 gares

We do an observe first, but something seems to be going on... We need to investigate more. Somehow with our --async-full we are getting very strange things.

It is a combination CoqIDE does not let you do, since adds are always followed by observe.

I see, thanks, we do n adds one observe; with usual CoqIDE mode it seems to work well.

ejgallego avatar Jun 20 '16 20:06 ejgallego

By the way, I'd swear that I witnessed a race condition just now: somehow, after an observe that triggered async checking, a worker produced a spurious error! Interesting!

ejgallego avatar Jun 20 '16 22:06 ejgallego