shapes icon indicating copy to clipboard operation
shapes copied to clipboard

How do I retain performance after switching from `ST` to `PrimMonad` typeclass?

Open ublubu opened this issue 6 years ago • 3 comments

40571263c6666eb6958114acee5a376bad549f14 <- I tried out the PrimMonad typeclass to make it easier to embed the engine in IO, but it absolutely wrecked performance (8x slowdown).

Pretty much the only change I made was to turn all the ST s into a PrimMonad m. How do library authors typically deal with this?

@schell

This isn't a change I need (there are plenty of other ways to embed), but now I'm curious.

ublubu avatar Jun 11 '18 19:06 ublubu

We can figure it out by comparing the two implementations' core dumps or possibly by comparing profiling results. The PrimMonad version must be spending more CPU time in this added layer of indirection, most likely dispatching from the PrimMonad typeclass.

In this case it may not be necessary to confirm with core or profiling. Some slowdown is expected from adding a level of indirection and in some cases the slowdown is worth the extra polymorphism (as in vector's use of PrimMonad), but here it seems the slowdown is too costly.

schell avatar Jun 11 '18 21:06 schell

I feel like I remember mention of a process that can optimize away the dictionary-passing part of typeclassed code. Is it just specialization? I wonder if adding SPECIALIZE everywhere would solve the problem, but that would be a pain to check. (I suppose it's worth coming up with a simpler example to experiment with anyway.)

How is vector so fast even though it uses PrimMonad? Does it have to do with inlining?

So many questions to find answers to. =)

Here are the profiling results for both versions:

https://gist.github.com/ublubu/004c2b569a28eb59fa7944dab663f0d7

(side note: The ST version of the mutable-vector-World benchmarks about 10% faster than the current IntMap-World, which is kinda nice.)

ublubu avatar Jun 11 '18 22:06 ublubu

Yes I think SPECIALIZE would do it. I'm not sure how vector does it - though I see TONS of INLINE and some INLINE_FUSE pragmas. No SPECIALIZE though. Maybe it would be even faster monomorphised?

schell avatar Jun 12 '18 15:06 schell