Kaushik Kulkarni
Kaushik Kulkarni
> The CInstruction does not contain an sqrt function, but the instruction it depends on does. It needed a `map_resolved_function`. Pushed a fix to the branch.
Sorry, hadn't accounted for CInstruction. Pushed a fix that runs the following snippet as expected: ```python import loopy as lp from loopy.symbolic import parse tunit = lp.make_kernel( "{[i]: 0
There's a proposed fix for this at https://gitlab.tiker.net/inducer/pycuda/-/merge_requests/66.
There's no way to target the individual reduction nodes from the `extract_subst` either. On applying: `lp.extract_subst(knl, "subst", "x[arg0]", parameters=("arg0", ))` both the reductions are converted to a substitution.
@inducer: This is ready for another look!
> What would you like to happen? I would prefer returning a np ndarrray as we do for any other array with shape != (). We can see this inconsistency...
There's a proposed fix for this at https://gitlab.tiker.net/inducer/pycuda/-/merge_requests/73.
Also, I wanted to bring up the discussion if using the `extract_subst` for the purposes of hoisting expressions is a reasonable approach, i.e. can this be perceived as an anti-pattern?
I personally think instead of going down this road we might want to explore integrating with something like `MatchPy` in the future to allow richer expression matching including caring for...
1. CudaTarget does not register boolean dtype (that's an issue, I'll push a fix for that.) 2. `complex` arithmetic isn't supported in plain CUDA target. They are supported in the...