Kaushik Kulkarni
Kaushik Kulkarni
Here's an MWE: ```python >>> import pycuda.autoinit >>> import pycuda.gpuarray as cu_np >>> a = cu_np.zeros(10, dtype="int32") + 1 >>> b = cu_np.zeros(10, dtype="int32") + 2 >>> a / b...
Consider the simple batched matvec example: ```python knl = lp.make_kernel( "{[e,i,j]: 0
CFamilyTarget should only define the callables present in the intersection of all the C-based targets. The math functions on the complex-typed operands from `complex.h` aren't common to targets like OpenCL/Cuda...
Reported by @sv2518.
/cc @sv2518 Adds support for GNU vector extensions. TODO: - [x] Implement `OMPSimdInameTag`. - [x] In `loopy.codegen.expression` infer the fallback mechanisms from the target. - [x] Pass CI. - [...
The following kernel -- ``` knl = lp.make_kernel( "{[i, j]: 0
The implementation here is based on the paper "[Memory optimization by counting points in integer transformations of parametric polytopes](https://dl.acm.org/doi/abs/10.1145/1176760.1176771)". Draft because: - [x] Incomplete Implementation - [x] Needs `pw_qpolynomial_to_expr` -...
TODO: - [x] Do index analysis to verify the validity of the iname-duplication passes. - [x] Add complicated regressions. Draft because: - [ ] includes commits from #350. - [...
Implementation for finding loop nest around map in O(N.k), 'N' being the number of inames and 'k' being the max. loop depth. For comparison, let's consider the kernel in #288:...