purescript-in-purescript Performance Ideas

Type checker doesn't use the backtracking of Either, so it could just use exceptions and Eff.
Try writing the pretty printers using showsPrec-style pretty printing instead of using pattern-arrows.

We should also try to profile the running Javascript to see where the bottleneck is.

Jun 05 '14 14:06 paf31

I do recommend profiling before optimization. Javascript engines do remarkable transformations that make it even more difficult than usual to identify trouble spots without profiling.

Also to the extent that we can parallelize compilation, we can take advantage of web workers on the front-end (not sure if anything similar is available on the backend). With most computers having 4+ cores these days, you can get an easy 2 - 10x increase in performance just from parallelizing.

Jun 05 '14 14:06 jdegoes

You can spawn additional processes in node and communicate with JSON over IPC, but I suspect it's a bit clunkier than web workers.

Jun 05 '14 21:06 garyb

Well psc-make could be parallelized at a fairly high level, and the module graph gives a natural scheme for data flow parallelism. Definitely worth looking into later I think.

Jun 06 '14 04:06 paf31

I'm more concerned by the several orders of magnitude difference in single threaded performance right now though :P

Jun 06 '14 04:06 paf31

Also, the Map implementation used in the typechecker has O(N) lookup due to the order of the insertions. Either we should make Map automatically-balancing (preferred), or we should try a different data structure.

Jun 15 '14 21:06 paf31

I meant to say, I tried profiling psc-in-ps the other day and got a slightly strange results. As far as I can tell from the output, eqArray and bindStateT seem to be the most expensive operations with a grand total of ~5% of the total JS execution time each. It makes no sense.

This was using nprof. I couldn't get dtrace or look or to work.

Jun 20 '14 16:06 garyb

bindStateT sounds plausible, because both desugaring and type checking use StateT. We can probably use an IORef and exceptions in place of ErrorT in both cases, which should speed things up significantly.

Edit: Do you have the profiler results?

Jun 20 '14 16:06 paf31

Oh sure, it makes sense for what is expensive, I just meant it's odd that almost 80% of the time is missing in the profiler result, and that the supposedly two most expensive functions are only responsible for 10% of the total time.

I'll make a gist with the results when I get a minute.

Jun 20 '14 17:06 garyb

https://gist.github.com/garyb/4888f29a11c09e469a40

Different results on my home machine, the total time % value is even less for bindStateT / eqArray!

Jun 20 '14 18:06 garyb

Interesting. There are a few modules which show up quite often: Data.Map, the typechecker, Names, and PatternArrows. Maybe we can try optimizing those and see if that helps.

Jun 20 '14 18:06 paf31

I'm going to see if I can get dtrace working on a SmartOS VM, might get some more sense out of that too.

Jun 20 '14 18:06 garyb

I hacked some timing into the compiler, here's what I get for just running psc:

read & parse files: 10ms
  sortModules: 18ms
  desugar: 531ms
  typecheck: 6212ms
  binding groups: 96ms
  js codegen: 224ms
compile: 8519ms

So the type checker is by far the main culprit. Parser is amazingly fast considering.

Jun 24 '14 10:06 garyb

I've put it in a branch c1c7f8fda4a430d003cd4409b8f1423451b8fb88

Jun 24 '14 10:06 garyb

Great to know :) I'll have a look at the typechecker in the next couple of days. I have a few ideas for how it could be improved.

Jun 24 '14 15:06 paf31

New branch typecheck time is now ~3926ms down from ~5050ms. Pretty good, but considering the amount of work you did on it shame it's not a bit more!

Jun 26 '14 23:06 garyb

I have more up my sleeve :)

Jun 26 '14 23:06 paf31

Good stuff :)

Jun 26 '14 23:06 garyb

I have a couple of other things I'd like to try:

Reduce the amount of currying in the typechecker by using records as function arguments.
Compare performance of current ADT implementation to prototypes and instanceof checks.

but to be honest, I'm starting to lose faith that the JS version can be fast enough to be practical.

If we can't get this to within a factor of 10 of the performance of the Haskell version, I think we should seriously consider sticking with what we have, and just work on packaging it better for end users (binary distributions, apt packages, brew etc.) We should chat about it on IRC when everyone is online.

Edit: I'd also like to try replacing Data.Map with a self-balancing implementation to see if that helps.

Jun 27 '14 18:06 paf31

Unless the optimization is guided by fine-grained profiling, it'll just be blind luck if it improves anything (I know I say this all the time but it's so true it bears repeating... all the time :) ).

If modules can be compiled independently, then user compile time will be fairly small in the common case that only a couple files have changed (obviously this would require more a intelligent build process but that's something needed anyway). The speed of incremental compilation is really all that matters, since developers spend 99% of their time compiling a project that's only changed by a few lines.

The other thing is that Purescript does not have a sophisticated optimizer. The difference between GHC with no optimizations and GHC with all optimizations is astounding. Optimizations will be driven by slow performance and should have a profound impact on not just ps-in-ps but user-land code, too. It could be a really good thing the compiler's slow because it will accelerate optimizations.

Jun 27 '14 18:06 jdegoes

I agree about profiling, but I have no idea how to extract any meaningful data from a running instance of psc, other than to stick a console.time on every other line.

Regarding incremental compilation, psc-make will avoid recompiling entire modules if they haven't changed, which does help a lot in the Haskell compiler, but 5s to build the Prelude is just far too slow to be practical. Incremental compilation is only going to get us so far if the single-module compile time is high. Related: finer grained incremental compilation could also be worth looking at - per-binding-group, for example. Also, we can be cleverer about which modules need to be recompiled. Right now we're recompiling too many things when a module low in the dependency DAG changes.

As for optimizations, I'm all in favour if anyone wants to investigate. Hopefully after the hangout today we'll have a whole bunch of newly initiated compiler developers :) My only stipulation is that the optimizations should keep the generated code complexity approximately the same. Aggressive inlining, for example, could be a big help to performance, but not to readability.

Jun 27 '14 19:06 paf31

I had no luck with getting an OS that dtrace works on to run, but I'm trying something else now which will hopefully let us profile.

Jun 27 '14 19:06 garyb

One thing that might be worth reviewing is if we have things left in the code that are being evaluated unnecessarily due to strict evaluation.

One example of this is when we use guardWith from the typechecker monad, so when doing something like this:

guardWith (strMsg $ "Expected type of kind *, was " ++ prettyPrintKind kind) $ kind == Star

we're calling strMsg and prettyPrintKind even if the error never happens, I think.

Jul 06 '14 14:07 garyb

After doing that stuff with isNullaryConstructor in codegen the other day it occurred to me that I could probably hack newtypes in during codegen (applying to any data constructor that is defined like a newtype). I was hoping it would give us a nice boost, as StateT's data constructor and the like are newtypes then.

I think it knocked an entire 20-30ms off. :)

Jul 19 '14 13:07 garyb

I've changed my mind on newtypes a little. I think giving them their own keyword would be a good idea, just because it makes the code generation strategy easier to explain. data types always get a class, newtypes are just a type wrapper with no cost at runtime.

Jul 19 '14 15:07 paf31

Yeah, agreed. That way, as with Haskell, you know for sure they are inlined too as using newtype can enforce the one-argument-one-constructor rule.

My above comment was more just saying that even applying it globally makes a tiny difference to performance, so it doesn't need to be that high on the list of optimizations to add. The main thing it will help with is optimizing algorithms using f-algebras a la recursion-schemes and the like, as TCO starts getting applied in places it didn't use to.

Jul 19 '14 16:07 garyb

Looks like there's a way to get dtrace equivalent info out of Node on Linux now: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html

Should be helpful when looking at performance again!

Nov 20 '14 09:11 garyb

purescript-in-purescript purescript-in-purescript copied to clipboard

Performance Ideas

purescript-in-purescript
purescript-in-purescript copied to clipboard