Idris2-boot icon indicating copy to clipboard operation
Idris2-boot copied to clipboard

Unexpected run-time complexity

Open nicolabotta opened this issue 4 years ago • 13 comments

The program

import Data.Fin
import Data.Vect

tail : {0 A : Type} -> {n : Nat} -> (Fin (S n) -> A) -> (Fin n -> A)
tail f = f . FS

toVectFun : {0 A : Type} -> {n : Nat} -> (Fin n -> A) -> Vect n A
toVectFun {n =   Z} _ = Nil
toVectFun {n = S m} f = (f FZ) :: (toVectFun (tail f))

N : Nat
N = ...

xs : Vect (S N) Nat
xs = toVectFun {n = S N} finToNat

main : IO ()
main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromInteger (natToInteger N)) xs))

type checks and executes in more than quadratic time in N in idris 2:

N =  20, user time =   0.78s  
N =  40, user time =   2.57s 
N =  80, user time =  14.49s
N = 160, user time =  segfault
N = 320, user time =  segfault

For "large" values of N, it segfaults. In idris 1, the correspondent program also does not run in linear time

N =  100, user time =   3.3s  
N =  200, user time =   4.9s 
N =  400, user time =  10.1s
N =  800, user time =  36.4s
N = 1600, user time = 197.6s 

, but it does not segfault and is faster. What is the expected complexity of the program? Shouldn't building the vector and reading its last component be linear operations in N?

nicolabotta avatar Aug 01 '19 16:08 nicolabotta

At least at compile time, it's almost certainly down to the fromInteger which has to compute a proof that the Fin is within bounds. This is a compile time thing. Are you compiling and executing things too? And if so, what happens? Can you be precise about where these numbers come from?

edwinb avatar Aug 01 '19 18:08 edwinb

Right, all test were done by running time idris2 --exec main Linear.idr. For instance, for N = 20:

nicola@lt561:~/gitlab/botta/Idris2Libs/tests$ time idris2 --exec main Linear.idr 
xs(20) = 20

real	0m0.876s
user	0m0.850s
sys	0m0.024s
nicola@lt561:~/gitlab/botta/Idris2Libs/tests$

The correspondent Idris 1 program was

> import Data.Fin                                                                                                         
> import Data.Vect                                                                                                        
                                                                                                                          
> tail : {A : Type} -> {n : Nat} -> (Fin (S n) -> A) -> (Fin n -> A)                                                      
> tail f = f . FS                                                                                                         
                                                                                                                          
> toVectFun : {A : Type} -> {n : Nat} -> (Fin n -> A) -> Vect n A                                                         
> toVectFun {n =   Z} _ = Nil                                                                                             
> toVectFun {n = S m} f = (f FZ) :: (toVectFun (tail f))                                                                  
                                                                                                                          
> N : Nat                                                                                                                 
> N = 1600                                                                                                                
                                                                                                                          
> xs : Vect (S N) Nat                                                                                                     
> xs = toVectFun {n = S N} finToNat                                                                                       
                                                                                                                          
> main : IO ()                                                                                                            
> main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromInteger (toIntegerNat N)) xs)) 

also tested with time idris --exec main Linear.lidr. How can I just generate an executable in Idris 2?

nicolabotta avatar Aug 01 '19 18:08 nicolabotta

You can use :c at the REPL just like in Idris 1, as long as you have chez scheme installed. Nobody has implemented -o yet.

There seems to be a couple of weird things going on. Firstly, there's more effort than there should be going into type checking, but also there's a segfault in compilation that happens in a surprising place. I'll try to look into it, because this sort of thing is not supposed to happen with the way Idris 2 is set up, but I should also go on holiday at some point so it might be a while unless someone else is enthusiastic about poking around the core.

edwinb avatar Aug 01 '19 18:08 edwinb

Hmmm, chez scheme is installed but :cdoes not seem to work as in idris 1:

nicola@lt561:~/gitlab/botta/Idris2Libs/tests$ idris2 Linear.idr 
Welcome to Idris 2 version 0.0. Enjoy yourself!
Main> :c linear2.exe
(interactive):1:11--1:15:'.' is not a prefix operator
Main>

nicolabotta avatar Aug 01 '19 18:08 nicolabotta

@nicolabotta: :c is used like this: :c linear main, where linear is the filename and main is the entrypoint function. The Chez codegen will then generate a linear.ss file and a linear.so file.

Currently the parser expects an "unqualified name" for the filename, thus it is not allowed to have . in the filename (ref. https://github.com/edwinb/Idris2/blob/master/src/Idris/Parser.idr#L1422).

@edwinb: I can look into implementing -o this weekend.

chrrasmussen avatar Aug 01 '19 19:08 chrrasmussen

@chrrasmussen: thanks! I can confirm the problem is a type checking one. Once an executable is generated, the run-time seems to be constant in N in idris 2.

@edwinb: thanks but do not forget your holiday!

As it turns out, for N = 160, idris 2 type checking terminates (after a few minutes!) but :c linear main aborts with a segfault. Thus, indeed, there seem to be two issues here:

  1. The type checking time that is neither constant nor linear in N.

  2. The segfault problem.

nicolabotta avatar Aug 01 '19 19:08 nicolabotta

So I've worked out what's going on - interestingly it's fast in Idris 1 because there it reduces everything to normal form, and it's slow in Idris 2 because it's only computing what's needed to build the proof. This sounds good, until you look at the suspended computation and see that it's shared everywhere throughout the Fin that gets built. It's erased at run time, so if you get that far, it runs pretty much instantly (I succeeded up to N=320 after a bit of tweaking at least).

I don't know exactly the right way to proceed, but I have a long train journey tomorrow so I'll have a poke. The question is why sharing gets broken - it may be to do with auto implicit search or maybe something a bit deeper. Or it might be that for the moment we need a different way of implementing fromInteger for Fin.

edwinb avatar Aug 01 '19 23:08 edwinb

I'll play around with a different implementation of fromInteger and report. I am actually more worried about the Idris 1 run-time behavior. Shouldn't it be linear in N?

nicolabotta avatar Aug 02 '19 05:08 nicolabotta

It should be linear in N but I'm having trouble testing that experimentally because it erases things predictably in Idris 2 so I'm only really seeing the start up cost of the chez runtime!

edwinb avatar Aug 02 '19 07:08 edwinb

It seems plausible that the auto implicit in fromInteger is the culprit for the segfaults in Idris 2. The codes

> import Data.Fin
> import Data.Vect

> tail : {A : Type} -> {n : Nat} -> (Fin (S n) -> A) -> (Fin n -> A)
> tail f = f . FS

> toVectFun : {A : Type} -> {n : Nat} -> (Fin n -> A) -> Vect n A
> toVectFun {n =   Z} _ = Nil
> toVectFun {n = S m} f = (f FZ) :: (toVectFun (tail f))

> N : Nat
> N = 800

> xs : Vect (S N) Nat
> xs = toVectFun {n = S N} finToNat

> fromNat : {n : Nat} -> Nat -> Fin n
> fromNat {n = S m}  Z    = FZ
> fromNat {n = S m} (S i) = if i < m then FS (fromNat i) else last 

> main : IO ()
> -- main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromInteger (toIntegerNat N)) xs))
> main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromNat N) xs))

and

import Data.Fin
import Data.Vect

tail : {0 A : Type} -> {n : Nat} -> (Fin (S n) -> A) -> (Fin n -> A)
tail f = f . FS

toVectFun : {0 A : Type} -> {n : Nat} -> (Fin n -> A) -> Vect n A
toVectFun {n =   Z} _ = Nil
toVectFun {n = S m} f = (f FZ) :: (toVectFun (tail f))

N : Nat
N = 800

xs : Vect (S N) Nat
xs = toVectFun {n = S N} finToNat

fromNat : {n : Nat} -> Nat -> Fin n
fromNat {n = S m}  Z    = FZ
fromNat {n = S m} (S i) = if i < m then FS (fromNat i) else last 

main : IO ()
-- main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromInteger (natToInteger N)) xs))
main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromNat N) xs))

type checks and run in Idris 1 and 2 in the following times:

nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris Linear.lidr -o linear

real	0m7.838s
user	0m7.060s
sys	0m0.496s
nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time ./linear
xs(800) = 800

real	0m0.117s
user	0m0.112s
sys	0m0.004s

and

nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris2 --cg chicken Linear.idr
1/1: Building Linear (Linear.idr)
Welcome to Idris 2 version 0.0. Enjoy yourself!
Main> :c linear2 main
linear2 written
Main> :q
Bye for now!

real	0m45.559s
user	0m41.444s
sys	0m0.316s
nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time ./linear2
xs(800) = 800

real	0m0.308s
user	0m0.292s
sys	0m0.008s

Type checking is much slower in Idris 2 but generating and running the executable takes about the same time as in Idris 1.

If we uncomment/comment main so as to use fromInteger the situation gets much worse for type checking and compiling in Idris 1:

nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris Linear.lidr -o linear

real	2m12.576s
user	1m37.984s
sys	0m10.548s
nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time ./linear
xs(800) = 800

real	0m0.077s
user	0m0.072s
sys	0m0.000s

In Idris 2, type checking takes hours or longer and, as we have seen, compiling yields segfaults even for smaller values of N.

nicolabotta avatar Aug 02 '19 11:08 nicolabotta

I've found that chicken takes quite a while to generate code from the scheme that Idris passes to it. It also generates code that is quite a bit slower than Idris 1 generates. If you can, I'd recommend giving it a go with Chez (though I seem to remember there was some reason this didn't work in the past). Possibly using the racket code generator would also give better performance. There's something about the scheme we generate that chicken just doesn't like...

Anyway - I've been poking around this one a bit. I think I now understand what's going on, and it is unfortunately hard to solve, because in the end, Fin doesn't scale well at all. There are things we can do to improve evaluation time (I've tried a few which have helped), but even if we do, we'll just hit a performance problem a bit later. Whatever happens, converting to Fin is linear time at best (and the indices mean it's probably quadratic or worse when computing a normal form).

Really for this kind of problem what we need is a proper O(1) implementation of arrays with compile time bounds checking. This is achievable with QTT, but there isn't an easy answer.

edwinb avatar Aug 05 '19 18:08 edwinb

In the specific example, with chicken it takes about twice as long as with chez to compile the fromNatcase:

nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris2 --cg chicken Linear.idr
1/1: Building Linear (Linear.idr)
Welcome to Idris 2 version 0.0. Enjoy yourself!
Main> :c linear2 main
linear2 written
Main> :q
Bye for now!

real	0m39.584s
user	0m34.919s
sys	0m0.254s
nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris2 Linear.idr
1/1: Building Linear (Linear.idr)
Welcome to Idris 2 version 0.0. Enjoy yourself!
Main> :c linear2 main
compiling linear2.ss with output to linear2.so
linear2 written
Main> :q
Bye for now!

real	0m19.038s
user	0m15.835s
sys	0m0.081s

The fromInteger . natToInteger case, however, fails to terminate both with chicken and with chez. With chez, for instance:

nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$ time idris2 Linear.idr
1/1: Building Linear (Linear.idr)
Uncaught error: Error in TTC file: End of buffer when reading String

real	583m54.436s
user	583m31.545s
sys	0m17.752s
nicola@cirrus:~/gitlab/botta/Idris2Libs/tests$

nicolabotta avatar Aug 07 '19 05:08 nicolabotta

It seems that indexing Vect n a with bounded natural numbers instead of Fin ns gives a nearly perfect quadratic complexity:

import Data.Fin
import Data.Vect
import Data.Nat

LTB : Nat -> Type
LTB b = DPair Nat (\ n  => LT n b)

idx : {n : Nat} -> Vect n alpha -> LTB n -> alpha
idx {n = Z}    Nil      (MkDPair  i    prf) = absurd prf
idx {n = S m} (x :: xs) (MkDPair  Z    prf) = x
idx {n = S m} (x :: xs) (MkDPair (S k) (LTESucc prf)) = idx xs (MkDPair k prf)

tail : {0 A : Type} -> {n : Nat} -> (Fin (S n) -> A) -> (Fin n -> A)
tail f = f . FS

toVectFun : {0 A : Type} -> {n : Nat} -> (Fin n -> A) -> Vect n A
toVectFun {n =   Z} _ = Nil
toVectFun {n = S m} f = (f FZ) :: (toVectFun (tail f))

N : Nat
N = 100

xs : Vect (S N) Nat
xs = toVectFun {n = S N} finToNat

lemma : (n : Nat) -> LT n (S n)
lemma  Z    = LTESucc LTEZero
lemma (S m) = LTESucc (lemma m)

main : IO ()
-- main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (index (fromInteger (natToInteger N)) xs))
main = putStrLn ("xs(" ++ show N ++ ") = " ++ show (idx xs (MkDPair N (lemma N))))
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$ time idris2 --exec main 71_run_time_complexity.idr 
xs(100) = 100

real	0m0.999s
user	0m0.958s
sys	0m0.040s
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$ time idris2 --exec main 71_run_time_complexity.idr 
xs(200) = 200

real	0m2.557s
user	0m2.540s
sys	0m0.017s
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$ time idris2 --exec main 71_run_time_complexity.idr 
xs(400) = 400

real	0m8.460s
user	0m8.423s
sys	0m0.037s
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$ time idris2 --exec main 71_run_time_complexity.idr 
xs(800) = 800

real	0m33.110s
user	0m33.032s
sys	0m0.070s
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$ time idris2 --exec main 71_run_time_complexity.idr 
xs(1600) = 1600

real	2m16.461s
user	2m16.224s
sys	0m0.112s
nicola@lt561:~/gitlab/botta/Idris2Libs/issues$

This is perhaps disappointing but still far better than using the orginal index function and converting Nats to Fin ns: this segfaults already for N = 100!

What kind of conclusions can we infer from this example? Is this an irrelevant, pathological case? Is it an issue that we are likely to encounter at application-level code?

nicolabotta avatar Aug 21 '19 09:08 nicolabotta