metacoq icon indicating copy to clipboard operation
metacoq copied to clipboard

Recursive TemplateMonad programs are slow to run

Open john-ml opened this issue 5 years ago • 2 comments
trafficstars

The runtime of the following function increases drastically with n (comments give the runtimes on my machine):

From MetaCoq Require Import Template.All.
Import MonadNotation.

Fixpoint tm_double n : TemplateMonad nat :=
  match n with
  | 0 => ret 0
  | S n =>
    n' <- tm_double n ;;
    ret (S (S n'))
  end.

Time Run TemplateProgram (tmPrint =<< tm_double 1000). (*  1.296s *)
Time Run TemplateProgram (tmPrint =<< tm_double 2000). (*  4.444s *)
Time Run TemplateProgram (tmPrint =<< tm_double 3000). (* 10.117s *)
Time Run TemplateProgram (tmPrint =<< tm_double 4000). (* 21.374s *)
Time Run TemplateProgram (tmPrint =<< tm_double 5000). (* 29.005s *)

This prevents me from writing nontrivial recursive functions in the template monad.

john-ml avatar Jul 16 '20 22:07 john-ml

I think that this is an implementation issue in the interpreter. Mtac2 uses a strategy where they work directly on the reduction mechanism whereas the one here is (I'm pretty sure) just calls to hnf followed by pattern matches.

It is possible that we could improve this by fusing tmBind with each operation and then implementing tmBind as a function. This would avoid places where we have tmBind (tmBind .. ..) .. and keep computation within the Coq kernel in that case. In your example, you will essentially build a tmBind (tmBind (tmBind .. ..) ..) ..) ..) tower before actually making any progress which is sub-optimal at best. Out of curiosity, what impact does cps converting this have?

Fixpoint tm_double n (acc : nat) : TemplateMonad nat :=
  match n with
  | 0 => ret acc
  | S n =>
    tm_double n (S (S acc)) 
  end.

I assume that evaluating the above will be significantly faster since there is no tmBinds at all. This is an upper bound on the performance improvement that you could expect from the proposal above but there would be some extra overhead to normalize things. Personally, it seems like both fixing the interpreter and making tmBind a function should be done, but they are different amounts of work.

gmalecha avatar Aug 02 '20 17:08 gmalecha

Out of curiosity, what inpact does cps converting this have?

The CPS converted version is indeed significantly faster---the same benchmarks finish near-instantly.

john-ml avatar Aug 02 '20 17:08 john-ml