Results 33 comments of David Hou
trafficstars

Added detailed behavior spec. The fusion decision for the parallel reduces should be straightforward and "free" performance wise, but fusing conv(a + b) may be bad in some cases. Need...

the scheduler change is a little tricky, since you need to make sure that each grouping is a contiguous sub-DAG. My solution to this is currently to do the grouping...

i implemented `deferring contiguous reduces until you run out of nodes in queue`. it seems to work quite well, and it passed all the tests I had (very surprised that...

Some of them are also children of the forward pass. How can we tell if there is a path forward -> BN forward -> stuff -> fusion targets so that...

> If you fuse those targets the doesn't the cache fill up with a bunch of the "stuff" bufs? We wanna fuse if they're sharing parents. we need to allow...

hm, i think one of these kernels has a superset of "stuffs" across the rest of the fusion targets. i think that makes it safe to not check the "stuffs"...

oh, i overlooked this PR. this is pretty much what i had in mind, nice!