loopy
loopy copied to clipboard
A code generator for array-based code on CPUs and GPUs
Maybe the newly introduced accumulator variables should be tagged with some specifications or the naming must be standardized. In feinsum, I have to rely on internals of realize_redudction (cf https://github.com/kaushikcfd/feinsum/blob/87e3a43dff6ebbc3709a3afc9df174ab2e12c8fa/src/feinsum/tuning/impls/ifj_fe_fej_to_ei.py#L447-L448)...
There's no way to express such match criteria, maybe we need one such `MatchExpressionBase`?
The function signature for the transforms aren't finalized yet. I'm happy to make changes to these with help from the reviewers. Draft because: - [x] Pass CI - [x] Change...
Used in e.g. `split_reduction_outward`, #711. cc @kaushikcfd
TODOs: - add to ~~MemAccess~~, ~~Sync~~
Followup of #350
This currently a proof-of-concept. **Edit:** I removed the previous performance results, they were likely caused by some kind of caching of kernels. TODOs: - [x] ~~add `mutate` support to constantdict...
What do you think @inducer? This would not only make debugging cache misses easier, it could also be used to automate determinism tests (by setting `LOOPY_ABORT_ON_CACHE_MISS` to something trueish, and...