gswangg
gswangg
Also, I'm pretty sure the UPats I wrote are nonsensical and need to be redone. I will be busy the next couple of days but I'll work on this when...
> Also, I'm pretty sure the UPats I wrote are nonsensical and need to be redone. I will be busy the next couple of days but I'll work on this...
I'm able to reproduce the GPU/IMAGE and openpilot test failures locally on my M1 macbook with only these two commits: 0152a86c66ff7d7432d5c6fc8ecdf4063ff03155: remove test_const_vectorize_fold 0ff3cb92c13e291b216d949c5a48015f213873a1: remove const folding UPat for VECTORIZE...
I instrumented the PatternMatcher rules and identified [this UPat](https://github.com/tinygrad/tinygrad/pull/5322/files#diff-00bd44b667ec90ae1d3e984e699bc6b498c84ca1b1bd15a025437ded227457bfR500) as the source of the divergent behavior when using VECTORIZE instead of CONST in [the post-lowerer branch](https://github.com/greg-niemeyer/tinygrad/tree/post-lowerer). Changing the rule to...
Didn't have too much time to work on this today, but I managed to fix the AMD test failure and address some of the review feedback. Tomorrow I'll have more...
> can you use vectorize for WMMA too? I dumped the UOps.WMMAs produced when running test_linearizer.py and they look like they're getting vectorized already -- they all look like: UOps.WMMA...
> Also, I'll look into what I can do to fix process_replay I was able to remove the extra cast, but undoing the diff to the way parentheses and some...
> yea that extra render_cast in cstyle is fine for now, it's not your fault. Do you want me to revert this, then?: https://github.com/tinygrad/tinygrad/pull/5322/commits/ebdc42468503bebfb1a59db438c6c2ff45c1d766
> I'd keep the cstyle render_cast for now and add [run_process_replay] back. We'll remove that in another diff. Let's focus this diff on just the CONST -> VECTORIZE change. I...
I was able to fix the process_replay for METAL tests, but its ugly. AFAICT, ugliness is tough to get rid of due to the differences in VECTORIZE vs implicitly vectorized...