loopy
loopy copied to clipboard
A code generator for array-based code on CPUs and GPUs
Even with #755, attempting to prefetch many arrays scales poorly. By the 19th add_prefetch operation it takes around 5 seconds for add_prefetch to complete on one fused Mirgecom kernels with...
I'm not sure why it becomes unschedulable. ```python import loopy as lp import numpy as np from pymbolic.primitives import * import immutables e2p_from_single_box_knl = lp.make_kernel( [ "[ntgt_boxes] -> { [itgt_box]...
``` knl = lp.make_kernel( [ "{[i,j]: 0
based on #350 / #690
For eg following code fails with `AttributeError: 'SeparateArrayArrayDimTag' object has no attribute 'stride'` ```python import loopy as lp import numpy as np child_knl = lp.make_function( [], """ g[0] = 2*e[0]...
For eg: ```python knl = lp.make_kernel( [ "{ [i]: 0
Context: https://github.com/inducer/loopy/pull/698#issuecomment-1306451565 IMO, this should only apply to code generation, not transforms. Transforms can receive permission to ignore FP reordering individually. Another aspect is that reductions do not even *have*...
Analogous to https://github.com/inducer/pyopencl/issues/668. We probably want to restrict this checking to `__debug__` mode.
Failure on pocl-cuda with `n=16` can be reproduced locally. With intel-cpu it is not reproduced locally, and is intermittent on CI. See https://github.com/inducer/loopy/actions/runs/3787208816/jobs/6438795872 Oclgrind, NVIDIA, pocl-pthread all work. Wonder if...