lazy Make the Lazy basic implementation much faster...

The current JavaScript Native "memoize" function works by returning a nested forcer function when the outer "memoize" function is called to initialize the representation of the Lazy type with the thunk argument so that the evaluated data will later be stored as a local variable within the outer "memoize" function's scope when the returned inner forcer function is called. This scheme of bulding a new nested inner function "on-the-fly" every time a new Lazy is initialized is very costly in execution time when run on many current main stream browsers such as Chrome 55: http://jsperf.com/iifes-vs-nested-functions/4.

In order to speed this up for some browsers, ~~the generic Lazy type is changed to an Elm Tagged Union so that the force function can call the Native JS "memoize" function on that type without needing to create any new functions.~~ so the code was tuned to use an Immediately Invoked Function Expression (IIFE) in order to produce somewhat better speed for some browsers, such as current Chrome.

As an added benefit, the Native JS "memoize" function can check for recursive evaluation of the thunk and throw an error early rather than waiting for a stack overflow on the infinite loop with very little cost in execution time.

This change affects ~~the Lazy a generic type,~~ the lazy type constructor function, and the force function as well as the Native JS "memoize" function; however, it makes no changes to the API of the Lazy library module.

This speed can be verified by timing the enumeration of the following simple natural number lazy list sequence producing function as per the following code:

type List a = Node a (Lazy (List a))

nat32s : () -> List Int
nat32s() =
  let nat32si n =
    Node n << lazy <| \() -> nat32si (n + 1)
  in nat32si 0

nthnat32 : Int -> String
nthnat32 n =
  let nthnati n nts =
    case nts of
      Node _ tl ->
        if n <= 0 then 
          case nts of
            Node hd _ -> toString hd else
        nthnati (n - 1) (force tl)
  in nthnati n (nat32s())

When run using the (latest) proposed changes compared to with the original code, running nthnat32 999999 (one million iterations), this test code is faster by about up to two and a half times for Chrome 55 Stable, but doesn't have much of an affect on Firefox 51 Stable or Microsoft Edge 38.14393.0.0, It is suspected that the problem with Chrome may be related to cache/heap/garbage collection, as the older version would get increasingly slower with extended repeated use of the Lazy type. This new version tends to better maintain the same performance. However, use the Lazy on Chrome is still slower than for Firefox, which is considerably slower than Edge, so there is still a considerable range of performance dependent on browser used of about three times.

As well, documentation is corrected for the erroneous andThen' function example which could not be compiled due to errors and a note added for potential implementers of a Lazy LazyList a' type that there is a very high execution time performance cost to adding the extra Lazy' wrapper and implementing cons' and append' as in this example which is required to avoid "The Halting Problem" when the resulting Lazy (LazyList a)` type is used to build infinite lazy lists..

Finally, a Lazy version of fix is provided (extending the API), that can by used to create recursion when passed a function of the accepted form; this has an advantage over the usual implementation of fix for strict strongly-typed languages such as Elm in that it uses lazy/force in its implementation and thus enjoys the benefits of recursive evaluation checks introduced with this PR. On the other hand, it cannot be used for ordinary recursion with an included recursion limit as ordinary fix can be (and thus building stack), as that will be automatically detected and an exception thrown if it is tried with this function. If one needs this behavior, a conventional fix should be used, although the code where that form of fix is used can often be written to not require fix at all (conventional function TCO'd recursive loops).

Jan 29 '17 17:01 GordonBGood

It's only faster because you're cheating.

> import Lazy
> x = Lazy.lazy (\_ -> 42)
Unevaluated <function> : Lazy.Lazy number
> y = Lazy.force x
42 : number
> x
Evaluated 42 : Lazy.Lazy number

Notice how the value of x is mutated by side effect of forcing it. That's not allowed. The logic behind the current implementation is that you still hold on to a function after calling force, the only thing you can do with a function is call it, and the only change memorization makes is to returned the cached value if one is available (which is safe because of purity, which this breaks).

Continuing in the same session:

> y
42 : number
> x
Evaluated 42 : Lazy.Lazy number
> y
42 : number
> x
42 : number
> x
Evaluated 42 : Lazy.Lazy number

Not just the value but the type of x fluctuates. This is broken.

Feb 01 '17 05:02 mgold

I was going to ask where the native functions you call in 0b72fe5 are defined, but just running elm make finds a compiler error.

Feb 01 '17 17:02 mgold

@mgold, putting it another way: the original code is slow because it is generating a function inside another function, which is remarkably slow on some browsers such as current Chrome V8 based browsers. I am trying to avoid that, but perhaps the method I tried breaks the rules.

For the first phase of your test, that is expected behavior: yes, the Native "memoize" function does mutate the `Lazy' Tagged Union from Unevaluated "thunk function" into Evaluated "Type", but that is still the same Type, just a different Tag and I thought that might be allowed, but I can see now that it still appears as a mutation

As to the second phase of your test, I can't repeat that and I can't see how it could possibly change type for "x" as the only mutation through native JS code occurs when `Lazy.force' is called and that is never called (again) in your second phase testing.

You are right that the type should never change and I would never have submitted the PR if I could see that it did.

I must admit defeat in finding any other way than the JS Closure pattern to implement the mutation while hiding the mutation from Elm, so use of `Lazy' in tight loops is going to be slow on some browsers until those browsers are optimized for this pattern. I did find that wrapping the Native JS "memoize" function in an Immediately Invoked Function Expression (IIFE) helped performance, particularly on Chrome where without it long running loops slowed down an additional about two and a half times. I have also retained the recursive evaluation detection ability in the code so at least that is some improvement (it doesn't cost significant time compared to the creation of function objects).

So back to using the JS Closure pattern...

Running the same tests, as follows: Phase I:

> x = Lazy.lazy <| \() -> 42
Lazy <function> : Lazy.Lazy number
> y = Lazy.force x
42 : number
> x
Lazy <function> : Lazy.Lazy number

Phase II in the same session:

> y
42 : number
> x
Lazy <function> : Lazy.Lazy number
> y
42 : number
> x
Lazy <function> : Lazy.Lazy number
> x
Lazy <function> : Lazy.Lazy number

Execution time is back to the same (slow) speed as the original code: acceptable on Edge, but not good on Firefox and poor on Chrome (although now no longer extremely poor for long running loops).

I have done a commit on my branch to the code as described above.

Feb 01 '17 17:02 GordonBGood

I know I saw the "x is a number now" bug more than once, but I can't reproduce it now either.

Feb 01 '17 19:02 mgold

@mgold , well, that code where Elm could see mutation is gone now anyway, so as far as I can tell, everything works as expected now.

I don't know that I can do much better than this and get reasonable performance across all browsers, as some improvements that I can make that help Edge somewhat affect Firefox negatively a little bit and greatly affect Chrome such that the ratio in speed between Edge and Chrome becomes over twenty times!!!

Feb 02 '17 01:02 GordonBGood

But wait just a bit: I am thinking that this fairly major change may as well include increasing the API slightly more to include lazy versions of the sharing fix point combinator and the (non-sharing) Y-combinator. These are very simple to write (although perhaps hard for some to understand ;) ) functions that are useful in implementing recursion, with fix using the same input and output (sharing) and ycomb not (non-sharing). They can be written as follows:

fix : (a -> a) -> a
fix f =
  let r = lazy <| \_ -> f <| force r in force r -- sharing

and

ycomb : (a -> a) -> a
ycomb f =
  let r = lazy <| \() -> f <| ycomb f in force r -- non-sharing

I think they belong in this library because I use the `Lazy'ified thunk rather than a simple thunk so that they can enjoy the benefits of the (new) recursive evaluation checking feature; they are also a handy tool that can be used to inject laziness into recursive deferred operations like, say, lazy lists.

Of course I have full documentation headers including examples of the use of each.

What do you think, do they belong here?

Feb 02 '17 02:02 GordonBGood

@mgold, Regarding the addition of fix and ycomb, I have been working on that at forgot that in order to code this way, the compiler would have to be able to have type contexts and forall, which it does not have and may never have as it is perhaps too complex for the intended users.

The problem is fixable by changing the type signatures to: (Lazy a -> a) -> a and coding the functions accordingly, but although that makes fixusable and fast, theycomb` is very slow even when one gets it to work.

The 'fix` function is the most useful as it provides recursive variables as mentioned in Evan's article: https://github.com/elm-lang/elm-compiler/blob/0.18.0/hints/bad-recursion.md, but in this case with the recursive evaluation check, I recommend that this function by included in the API, since there are times when there is no other way to use this pattern (or write it one's self)

For ycomb there are other ways to write code to achieve the same ends, and although I could provide a Z-combinator, that is also not that useful as one can accomplish this too by other means. Therefore, my recommendation is that ycomb not be included in the API.

If you agree, I will shortly do a commit adding fix to this API, fully tested and working will documentation and a good example.

Feb 02 '17 11:02 GordonBGood

...this is way too much detail and Evan is not going to ask you to open a new PR.

Feb 02 '17 15:02 mgold

@mgold, the fix is in, and I think I'm done here unless there are problems or requested changes.

Feb 03 '17 01:02 GordonBGood

I can't merge elm-lang stuff, only elm-community stuff. Sorry to lead you on.

Feb 22 '17 06:02 mgold

@mgold, so wait for Evan?

Feb 22 '17 09:02 GordonBGood

@mgold, this Pull Request turns out to be reasonably minor; the major Plull Request is elm-community/lazy-list. Perhaps you are confused between the two?

Feb 22 '17 10:02 GordonBGood

Yes, wait for Evan here. Your lazy-list PR is on my todo list.

Feb 23 '17 05:02 mgold