Isuru Fernando
Isuru Fernando
After a couple of improvements to loopy and sumpy (derivtaker branch) pyinstrument output is now, ``` 84.866 loopy_reproduce.py:1 ├─ 39.403 generate_code_v2 loopy/codegen/__init__.py:404 │ ├─ 16.501 generate_host_or_device_program loopy/codegen/result.py:286 │ │ └─...
@inducer, https://github.com/inducer/pymbolic/pull/37 didn't help. Any other suggestions?
`align_two` call at https://github.com/inducer/loopy/blob/186f5095a54982b7eb2fda5e4b995d7c047fde1e/loopy/codegen/instruction.py#L43 takes a long time. That's fixed by https://github.com/inducer/loopy/pull/280
Hmm. I can forward a bug report to conda-forge's Intel contact. I'll try to get a C reproducer.
I've got some wheels at https://github.com/isuruf/isuruf.github.io/releases/tag/v1.0 with pocl vendored in if anybody is interested.
> What would be a reasonable way to make them installable `pip install (something)` without making them the default? We could make a package `pyopencl-pocl` and change pyopencl so that,...
And `pip install pyopencl` would only have the python package and the ICD loader
There's a pickled loopy kernel at https://gitlab.tiker.net/inducer/loopy/-/issues/213
Anyone have any ideas for benchmarks? The medium sized sumpy kernel takes way too much time at the moment. (20 mins per benchmarks run) See http://koelsch.d.tiker.net:8000 (Need to be inside...
> All of loopy takes about 680s. > 184s of that in check_variable_access_ordered. (which can be turned off, and which was the focus of !408) With #281, this is 4s...