qazal
qazal
looks good, can you resolve the conflicts?
thanks - adding this to the scheduler roadmap!
I need to think about the scheduler change a bit more, but in general we don't wanna do merge schedules, if there is grouping to be done it should be...
I think all of your fusion targets are children of `E_2048` https://tiny-tools-client.vercel.app/?id=3ef8c4a72b0c4999acca0dff9288b2fa could traversing its local graph work?
> forward -> BN forward -> stuff -> fusion targets If you fuse those targets the doesn't the cache fill up with a bunch of the "stuff" bufs? We wanna...
Moved everything to the scheduler, TestSchedule.test_fold_conv_relu_backward is green. Tomorrow: - the reshapes and permutes don't generalize to the arange fusion. Is there some sort of generic ShapeTracker witchcraft that'd handle...
Makes sense. I think the existing tests (specially external_test_opt!) where great in helping reason about the fusion approach. This PR includes tests covering all the possible cases for the new...
This will be merged after fusing all self-contained subgraphs (we'll get it for free) For now, there are no major speedup benefits from fusing realized reduces.
Status: This reduces resnet training time to 109h54m - I'm working on simplifying the diff. I think this is a good opportunity to generalize scheduler fusion in reduce_for_op.
will reopen once I get to pick this back up. For now, we have r3 that does the same thing as this one would do in less kernels. But some...