Andreas Klöckner
Andreas Klöckner
Watching some CL program (gory details [here](https://notes.tiker.net/s/3D3-VhuSf)) execute on pocl's CUDA backend via `nvprof` (Nvidia's profiling tool) gives me ``` API calls: 52.24% 1.20894s 72145 16.757us 7.6450us 22.126ms cuLaunchKernel 19.36%...
Flipping through the pthreads backend, it didn't seem as though core affinity for the created threads was getting enabled anywhere. If that's indeed the case, that might be an easy...
I've come across [another example](https://gist.github.com/inducer/8d1ad78c548bf053a85eabc015804484) of a kernel that appears to be miscompiled, resulting in a segfault or complaints about heap corruption. When I run that script on top of...
This kernel here: https://gist.github.com/inducer/8f7cd72829c85acc1d3fcb9c4a5dae05 gets vectorized into full-width 256-bit vectors without a problem by both Intel CL and ispc. (Note how the workgroup size already conveniently matches the expected vector...
Running ``` flake8==3.9.2 flake8-bugbear==21.4.3 flake8-polyfill==1.0.2 flake8-quotes==3.2.0 ``` on ``` def main(): def f(x): print(i, x) for i in [1, 2, 3]: f('HI') ``` gives me ``` bugbearbug.py:5:9: B007 Loop control...
For PyOpenCL's [2022.1.6 release](https://github.com/inducer/pyopencl/releases/tag/v2022.1.6), [`CITATION.cff`](https://github.com/inducer/pyopencl/blob/6b68b79e9400e802755735f366eda221973cb606/CITATION.cff) contained the name of an individual (@gw0) for whom only an alias, but no first and family names are known. I represented this in `CITATION.cff`...
This was a dead-end while debugging #77. No need for it currently, but who knows? Maybe we'll need it sometime.
The analogous issue to https://github.com/inducer/pytato/issues/163 also exists in pymbolic. cc @alexfikl
FP arithmetic isn't associative, after all.
cc @kaushikcfd Prototype remap: https://gist.github.com/inducer/b293430efb27c50d0048cd278540f038