David Hou
David Hou
Added detailed behavior spec. The fusion decision for the parallel reduces should be straightforward and "free" performance wise, but fusing conv(a + b) may be bad in some cases. Need...
the scheduler change is a little tricky, since you need to make sure that each grouping is a contiguous sub-DAG. My solution to this is currently to do the grouping...
i implemented `deferring contiguous reduces until you run out of nodes in queue`. it seems to work quite well, and it passed all the tests I had (very surprised that...
Some of them are also children of the forward pass. How can we tell if there is a path forward -> BN forward -> stuff -> fusion targets so that...
> If you fuse those targets the doesn't the cache fill up with a bunch of the "stuff" bufs? We wanna fuse if they're sharing parents. we need to allow...
hm, i think one of these kernels has a superset of "stuffs" across the rest of the fusion targets. i think that makes it safe to not check the "stuffs"...
oh, i overlooked this PR. this is pretty much what i had in mind, nice!