shapes
shapes copied to clipboard
How do I retain performance after switching from `ST` to `PrimMonad` typeclass?
40571263c6666eb6958114acee5a376bad549f14 <- I tried out the PrimMonad
typeclass to make it easier to embed the engine in IO, but it absolutely wrecked performance (8x slowdown).
Pretty much the only change I made was to turn all the ST s
into a PrimMonad m
. How do library authors typically deal with this?
@schell
This isn't a change I need (there are plenty of other ways to embed), but now I'm curious.
We can figure it out by comparing the two implementations' core dumps or possibly by comparing profiling results. The PrimMonad
version must be spending more CPU time in this added layer of indirection, most likely dispatching from the PrimMonad
typeclass.
In this case it may not be necessary to confirm with core or profiling. Some slowdown is expected from adding a level of indirection and in some cases the slowdown is worth the extra polymorphism (as in vector's use of PrimMonad
), but here it seems the slowdown is too costly.
I feel like I remember mention of a process that can optimize away the dictionary-passing part of typeclassed code. Is it just specialization? I wonder if adding SPECIALIZE
everywhere would solve the problem, but that would be a pain to check. (I suppose it's worth coming up with a simpler example to experiment with anyway.)
How is vector
so fast even though it uses PrimMonad
? Does it have to do with inlining?
So many questions to find answers to. =)
Here are the profiling results for both versions:
https://gist.github.com/ublubu/004c2b569a28eb59fa7944dab663f0d7
(side note: The ST
version of the mutable-vector-World benchmarks about 10% faster than the current IntMap
-World, which is kinda nice.)
Yes I think SPECIALIZE
would do it. I'm not sure how vector
does it - though I see TONS of INLINE
and some INLINE_FUSE
pragmas. No SPECIALIZE
though. Maybe it would be even faster monomorphised?