fused-effects icon indicating copy to clipboard operation
fused-effects copied to clipboard

Improve constant factors of injection into sums.

Open patrickt opened this issue 5 years ago • 6 comments

Right now we pay a performance penalty when using multiple effects in parallel, even though they fuse correctly, because inj (which is called every send is O(n) in the number of effects rather than O(1)). A smarter Sum/Union type could perhaps ameliorate this situation, though we may have to change some code upstream.

patrickt avatar Apr 25 '19 00:04 patrickt

I spent a while hacking through this on polysemy -- a few things that might save you some headaches:

  • Union can be O(1) in memory if you pack a tag along with it. Most people just stick a Word8 in there and then unsafe coerce their way back, but this thing wreaks havoc on the inliner.
  • Instead you can pack a singleton corresponding to the peano numbers. I was unable to prove decomp :: Union (e ': r) a -> Either (Union r a) (e a) using GHC's built-in Nat type. This singleton isn't technically constant memory, but in my experience, GHC is happy to specialise it away.
  • You can then convince GHC that the thing inside the Union is indeed what the tag says it is by packing an IndexOf row n a rather than e a directly (@i-am-tom pointed this out to me!)

The result is here if you're looking for inspiration. Presumably you can just chop out the Yo stuff and everything should work fine in fused-effects!

isovector avatar Apr 25 '19 08:04 isovector

@isovector Thank you so much!

patrickt avatar Apr 29 '19 15:04 patrickt

A morning spent hacking on this doesn’t seem to improve performance, unfortunately. The faster-union branch holds my unhappy results.

patrickt avatar Apr 29 '19 17:04 patrickt

@patrickt did you look at the resulting core? ~My guess is you're running into Trac #16473. If you have an afternoon's worth of CPU time to spare, try running the benchmarks against this GHC and I suspect you'll see a marked improvement (and if not, I hope to have it merged by 8.10, so you can see then).~

Scratch that, I'm not sure why you'd be hitting 16473. But looking at the core would be helpful!

isovector avatar Apr 29 '19 18:04 isovector

@isovector I am AFK now but am gonna look at the generated Core soon. My hypothesis either is that the slowness results from passing a dictionary around inside the Union GADT (which we needed to get Carrier instances to typecheck) or that my original assumption was wrong, and that inj/prj is not a meaningful bottleneck.

Thank you for your insight on this!

patrickt avatar Apr 29 '19 19:04 patrickt

Removing from the 1.0 milestone.

robrix avatar Oct 09 '19 00:10 robrix