fused-effects
fused-effects copied to clipboard
Improve constant factors of injection into sums.
Right now we pay a performance penalty when using multiple effects in parallel, even though they fuse correctly, because inj
(which is called every send
is O(n) in the number of effects rather than O(1)). A smarter Sum
/Union
type could perhaps ameliorate this situation, though we may have to change some code upstream.
I spent a while hacking through this on polysemy
-- a few things that might save you some headaches:
-
Union
can beO(1)
in memory if you pack a tag along with it. Most people just stick aWord8
in there and then unsafe coerce their way back, but this thing wreaks havoc on the inliner. - Instead you can pack a singleton corresponding to the peano numbers. I was unable to prove
decomp :: Union (e ': r) a -> Either (Union r a) (e a)
using GHC's built-inNat
type. This singleton isn't technically constant memory, but in my experience, GHC is happy to specialise it away. - You can then convince GHC that the thing inside the
Union
is indeed what the tag says it is by packing anIndexOf row n a
rather thane a
directly (@i-am-tom pointed this out to me!)
The result is here if you're looking for inspiration. Presumably you can just chop out the Yo
stuff and everything should work fine in fused-effects
!
@isovector Thank you so much!
A morning spent hacking on this doesn’t seem to improve performance, unfortunately. The faster-union
branch holds my unhappy results.
@patrickt did you look at the resulting core? ~My guess is you're running into Trac #16473. If you have an afternoon's worth of CPU time to spare, try running the benchmarks against this GHC and I suspect you'll see a marked improvement (and if not, I hope to have it merged by 8.10, so you can see then).~
Scratch that, I'm not sure why you'd be hitting 16473. But looking at the core would be helpful!
@isovector I am AFK now but am gonna look at the generated Core soon. My hypothesis either is that the slowness results from passing a dictionary around inside the Union GADT (which we needed to get Carrier instances to typecheck) or that my original assumption was wrong, and that inj/prj is not a meaningful bottleneck.
Thank you for your insight on this!
Removing from the 1.0 milestone.