Kaushik Kulkarni

https://kaushikcfd.github.io/

Qualcomm Research San Diego, CA

Results 91 comments of


                                            Kaushik Kulkarni

`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

Privatize currently does not support a `within`. Besides that, supporting this via a within is sketchy since between the call to `privatize_temporary_with_iname(..., within="iname:i")` and `privatize_temporary_with_iname(..., within="iname:j")` the shape of `tmp`...

`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

> But... the point of privatization is to get rid of array indices, right? Wouldn't privatization would [add array axes](https://documen.tician.de/loopy/ref_transform.html?highlight=privatize#loopy.privatize_temporaries_with_inames) instead of getting rid of them. I always thought it...

`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

> The iname already indexes the variable in question `lp.privatize_temporaries_with_inames` privatizes the temporaries. So `out` would remain untouched but `tmp` would be indexed, which in the untransformed kernel has the...

`lp.privatize_temporaries_with_inames` fails for multiple inames sweeping the same temporary

> Would it be ok if the privatization of tmp within i and j went into separate variables? For my application, this kernel was obtained during pre-processing after `i` and...

Loopy's codegenerator fails with non-deterministic error

I just ran this through the code-generator in #372 and that treats it correctly. At least from my end, debugging this on the current trunk seems unnecessary.

AssertionError when calling add_prefetch with SubArrayRefs

Hello @zachjweiner! There are a couple of issues here: 1. The issue you point out regarding prefetch over sub-array-refs is a bug. 2. There isn't a good shape-inference support at...

Implements a reindexing transformation

> That academia.edu link doesn't seem to work. Oops thanks, fixed!

[Transform API] Simple tiling can be tedious to implement

Some options could be: 1. a kernel could have an attribute `post_realize_reduction_tansforms_callback` 2. making reduction nodes taggable and defining some implementation tags that would help in this case.

[Transform API] Simple tiling can be tedious to implement

I would first privatize the temporary `acc_j_outer_j_inner` in the iname `i` and duplicate `i` in the instructions `insn` and `insn_j_outer_j_inner_init`. This way the accumulator's state is stored as we perform...

Fix CInstruction in calculating mem access map.

Thanks for the patch, L(almost)GTM! > I don't know if we should calculate something from the predicates or do you think this is sufficient? Doing something similar to `get_op_map` would...

‹
1
2
3
4
5
6
7
8
9
10
›