David Hou
David Hou
An update, for reference. The implementation is still just experimental. If anyone has any thoughts, please let me know! the original approach of indexing into the 4x4 HW pools seems...
OK we are now at only +1 real LOC in linearizer.py I think everything here that is changed is conceptually important. LocalBuffer is a RawBuffer, and the compiled backends only...
There's a lot less lines, but now printbufs won't work after the end of linearize (not used anywhere except kernel_search.py), and lib_test_ast doesn't really do the right thing afaict if...
The gigantic diff is back. Half the diff is making the local reduce LocalBuf not -1 though...
OK i feel like this is sufficiently refined. Let me know what you think. (the ci yml change is because gpuocelot is broken in cache right now for some reason
Do we want to dedup input buffers that are aliased to the output buffer?
closing; superceded by #1272 after refactor #1256
Should probably check # of unique RawBuffers rather than the number of LazyBuffers, since we will have some RawBuffer dedup soon #1141.
I am OK with removing. I think you will see perf regression on BEAM=2 convolution backwards. Current master doesn't trigger TC that much anyways on conv backward. I have an...